I meant "Thanks for pointing out JENKINS-41854" below.

On Tuesday, April 10, 2018 at 5:42:20 PM UTC-7, m...@basilcrow.com wrote:
>
> Thanks for pointing out JENKINS-45294. That is exactly what I am facing, 
> at least twice a month. It causes severe disruption to my users, so I need 
> to come up with a plan. I see that the bug is unassigned. If it isn't fixed 
> soon, I might have to try to fix it myself by necessity. I suppose the best 
> way to start would be by writing a test case that triggers the issue. Does 
> the Jenkinsrule test harness provide any functionality for setting up this 
> kind of scenario? I see there are some existing tests that restart Jenkins, 
> but I'm not sure how to write an automated test that makes a node 
> disconnect and reconnect in the manner described in the bug. Any advice or 
> pointers to existing code or tests would be appreciated.
>
> On Wednesday, April 4, 2018 at 1:26:29 AM UTC-7, Oleg Nenashev wrote:
>>
>> This issue seems to be Pipeline-specific (actually DueableTask-Specific). 
>> Standard Freestyle jobs should abort immediately on the agent 
>> disconnection, but Pipeline jobs may recover and continue using the 
>> workspace.
>>
>> However, it seems ugly to use the new channel in the "in use" map, 
>>> because the job is still technically running under the old channel.
>>
>>
>> No, it should be running under the new channel. Old channel gets 
>> disposed, and Remoting 3.14+ adds some diagnostics for these cases (e.g. 
>> JENKINS-45294 <https://issues.jenkins-ci.org/browse/JENKINS-45294>). Now 
>> it causes some issues in Durable task which does not always recreate 
>> FilePath and underlying Workspace (JENKINS-41854 
>> <https://issues.jenkins-ci.org/browse/JENKINS-41854> and other similar 
>> issues with "Channel is closing or closed").
>>
>> WorkspaceList#inUse should be reacquired by Pipeline for sure when it 
>> reconnects to a new agent. I would guess it happens even now (or not?), but 
>> clearly there is a potential of race conditions between recovered jobs and 
>> new submissions.
>>
>> The proposed patch may help, although workspace management is not really 
>> the strongest part of the Jenkins core. I would rather suggest redesigning 
>> it so that workspaces can be tracked independently on the node state (the 
>> proposed change does the same for a single cache). Some better UI/ 
>> workspace release features could be added as an added value.
>>
>> BR, Oleg
>>  
>>
>> On Monday, April 2, 2018 at 10:08:28 PM UTC+2, m...@basilcrow.com wrote:
>>>
>>> Hi Ivan,
>>>
>>> Thanks for your reply. I'm not exactly sure how my proposed workaround 
>>> would necessarily cause concurrency issues. Doesn't that depend on how it's 
>>> implemented? I agree that it's strange that the agent wasn't disconnected 
>>> and still keeps the old connection to the master, even though new jobs use 
>>> a new connection. Doesn't this violate the invariant implied by the 
>>> implementation of WorkspaceList#inUse, which is that the entries in the map 
>>> always represent the latest channel for a given node? This definitely seems 
>>> like a core bug to me. I don't believe I should need to tune my TCP stack, 
>>> because pipeline claims to be resilient to network outages. If the master 
>>> logs "SEVERE: I/O error in channel jenkins-node" and "INFO: Attempting to 
>>> reconnect jenkins-node", then why do jobs continue running on the old 
>>> connection, violating the invariant in WorkspaceList#inUse?
>>>
>>> Thanks,
>>> Basil
>>>
>>> On Sunday, April 1, 2018 at 6:47:02 AM UTC-7, Ivan Fernandez Calvo wrote:
>>>>
>>>> The pruposed workaround could cause concurrence issues, I think the the 
>>>> main issue why the agent is not disconnected and keep the old connection 
>>>> is 
>>>> the most important thing. Did you checked the open connection from the 
>>>> Agent to the master with netstat? It should be two connections the old one 
>>>> an an new one, Has the  agent more than one slave.jar process running? Are 
>>>> your agents VM or baremetal? Did you tune your tcp stack with proper 
>>>> values 
>>>> to keepalive?
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/9fd964f0-ad48-4a96-8cb2-f6524cfa9b33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to