This issue seems to be Pipeline-specific (actually DueableTask-Specific). 
Standard Freestyle jobs should abort immediately on the agent 
disconnection, but Pipeline jobs may recover and continue using the 
workspace.

However, it seems ugly to use the new channel in the "in use" map, because 
> the job is still technically running under the old channel.


No, it should be running under the new channel. Old channel gets disposed, 
and Remoting 3.14+ adds some diagnostics for these cases (e.g. JENKINS-45294 
<https://issues.jenkins-ci.org/browse/JENKINS-45294>). Now it causes some 
issues in Durable task which does not always recreate FilePath and 
underlying Workspace (JENKINS-41854 
<https://issues.jenkins-ci.org/browse/JENKINS-41854> and other similar 
issues with "Channel is closing or closed").

WorkspaceList#inUse should be reacquired by Pipeline for sure when it 
reconnects to a new agent. I would guess it happens even now (or not?), but 
clearly there is a potential of race conditions between recovered jobs and 
new submissions.

The proposed patch may help, although workspace management is not really 
the strongest part of the Jenkins core. I would rather suggest redesigning 
it so that workspaces can be tracked independently on the node state (the 
proposed change does the same for a single cache). Some better UI/ 
workspace release features could be added as an added value.

BR, Oleg
 

On Monday, April 2, 2018 at 10:08:28 PM UTC+2, m...@basilcrow.com wrote:
>
> Hi Ivan,
>
> Thanks for your reply. I'm not exactly sure how my proposed workaround 
> would necessarily cause concurrency issues. Doesn't that depend on how it's 
> implemented? I agree that it's strange that the agent wasn't disconnected 
> and still keeps the old connection to the master, even though new jobs use 
> a new connection. Doesn't this violate the invariant implied by the 
> implementation of WorkspaceList#inUse, which is that the entries in the map 
> always represent the latest channel for a given node? This definitely seems 
> like a core bug to me. I don't believe I should need to tune my TCP stack, 
> because pipeline claims to be resilient to network outages. If the master 
> logs "SEVERE: I/O error in channel jenkins-node" and "INFO: Attempting to 
> reconnect jenkins-node", then why do jobs continue running on the old 
> connection, violating the invariant in WorkspaceList#inUse?
>
> Thanks,
> Basil
>
> On Sunday, April 1, 2018 at 6:47:02 AM UTC-7, Ivan Fernandez Calvo wrote:
>>
>> The pruposed workaround could cause concurrence issues, I think the the 
>> main issue why the agent is not disconnected and keep the old connection is 
>> the most important thing. Did you checked the open connection from the 
>> Agent to the master with netstat? It should be two connections the old one 
>> an an new one, Has the  agent more than one slave.jar process running? Are 
>> your agents VM or baremetal? Did you tune your tcp stack with proper values 
>> to keepalive?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/66d428d6-c7c1-48fb-ab8a-4b7b7236a1eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to