Issue Type: Bug Bug
Assignee: Unassigned
Components: remoting
Created: 26/Nov/14 2:30 PM
Description:

Slave stops responding for whatever reason.
First ping thread notices slave does not respond:

Nov 26, 2014 2:00:59 PM INFO hudson.slaves.ChannelPinger$1 onDead
Ping failed. Terminating the channel.
java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)

(would be nice if we could see which channel/slave)

A bit later:

Nov 26, 2014 2:14:34 PM WARNING hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor monitor
Failed to monitor SV-ARG-DEV-D23 for Clock Difference
hudson.remoting.ChannelClosedException: channel is already closed
at hudson.remoting.Channel.send(Channel.java:541)
at hudson.remoting.Request.callAsync(Request.java:208)
at hudson.remoting.Channel.callAsync(Channel.java:766)
at hudson.node_monitors.AbstractAsyncNodeMonitorDescriptor.monitor(AbstractAsyncNodeMonitorDescriptor.java:76)
at hudson.node_monitors.AbstractNodeMonitorDescriptor$Record.run(AbstractNodeMonitorDescriptor.java:280)
Caused by: java.io.IOException
at hudson.remoting.Channel.close(Channel.java:1027)
at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
at hudson.remoting.PingThread.ping(PingThread.java:120)
at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1417006619758 hasn't completed at 1417006859758
... 2 more

Yet node is not shown as offline, and still shows as executor busy.
Threaddump shows executor thread on master:
Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164

"Executor #0 for DWI01164 : executing OKA.R201501/Team D/_publics/Public - 4 #103 / waiting for hudson.remoting.Channel@3b91e986:DWI01164" Id=16260 Group=main TIMED_WAITING on hudson.remoting.UserRequest@e4ca0c4
at java.lang.Object.wait(Native Method)

  • waiting on hudson.remoting.UserRequest@e4ca0c4
    at hudson.remoting.Request.call(Request.java:146)
    at hudson.remoting.Channel.call(Channel.java:739)
    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
    at com.sun.proxy.$Proxy56.join(Unknown Source)
    at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
    at hudson.Launcher$ProcStarter.join(Launcher.java:367)
    at hudson.tasks.Maven.perform(Maven.java:328)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:756)
    at hudson.model.Build$BuildExecution.build(Build.java:198)
    at hudson.model.Build$BuildExecution.doRun(Build.java:159)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:529)
    at hudson.model.Run.execute(Run.java:1706)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:232)

It looks like ChannelPinger calls channel.close() which sets channel.outClosed , yet Request is stuck in a loop checking for channel.isInClosed(), which was not set.
Should Request check for channel.isClosingOrClosed()?

Environment: LTS 1.565.2
slave 2.48
Project: Jenkins
Priority: Major Major
Reporter: Wannes Sels
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to