Issue Type: Bug Bug
Assignee: Unassigned
Components: remoting
Created: 26/Nov/14 2:30 PM
Description:

Slave stops responding for whatever reason.
First ping thread notices slave does not respond:

Nov 26, 2014 2:00:59 PM INFO hudson.slaves.ChannelPinger$1 onDead
Ping failed. Terminating the channel.
java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)

(would be nice if we could see which channel/slave)

A bit later:

Nov 26, 2014 2:14:34 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor SV-ARG-DEV-D23 for Clock Difference
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:541)
at hudson.remoting.Request.callAsync(Request.java:208)
at hudson.remoting.Channel.callAsync(Channel.java:766)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:280)
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1027)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
... 2 more

Yet node is not shown as offline, and still shows as executor busy.
Threaddump shows executor thread on master:
Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164

"Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164" Id=16260 Group=main TIMED_WAITING on hudson.remoting.UserRequest@e4ca0c4
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.UserRequest@e4ca0c4
    at hudson.remoting.Request.call(Request.java:146)
    at hudson.remoting.Channel.call(Channel.java:739)
    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
    at com.sun.proxy.$Proxy56.join(Unknown Source)
    at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
    at hudson.Launcher$ProcStarter.join(Launcher.java:367)
    at hudson.tasks.Maven.perform(Maven.java:328)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
    at hudson.model.Build$BuildExecution.build(Build.java:198)
    at hudson.model.Build$BuildExecution.doRun(Build.java:159)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
    at hudson.model.Run.execute(Run.java:1706)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:232)

It looks like ChannelPinger calls channel.close() which sets channel.outClosed , yet Request is stuck in a loop checking for channel.isInClosed(), which was not set.
Should Request check for channel.isClosingOrClosed()?

Environment: LTS 1.565.2
slave 2.48
Project: Jenkins
Priority: Major Major
Reporter: Wannes Sels
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to