Hi,
I am trying to debug the following symptom:
Jenkins started a slave. The slave died (machine hung, never had a chance to
communicate back to master). Jenkins tries to restart it, but is not able to.
When trying to restart the slave manually nothing happens. The slave logs are
and remain empty with the spinning icon just running.
I had a look at the thread dump and saw a number of threads blocked and waiting
for the following thread:
"Computer.threadPoolForRemoting [#14515]" Id=577044 Group=main WAITING on
com.trilead.ssh2.channel.Channel@76dd2191
at java.lang.Object.wait(Native Method)
- waiting on com.trilead.ssh2.channel.Channel@76dd2191
at java.lang.Object.wait(Object.java:503)
at
com.trilead.ssh2.channel.ChannelManager.waitUntilChannelOpen(ChannelManager.java:109)
at
com.trilead.ssh2.channel.ChannelManager.openSessionChannel(ChannelManager.java:583)
at com.trilead.ssh2.Session.<init>(Session.java:41)
at com.trilead.ssh2.Connection.openSession(Connection.java:1129)
- locked com.trilead.ssh2.Connection@1e553bf
at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:99)
at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
at
hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1160)
- locked hudson.plugins.sshslaves.SSHLauncher@3c884383
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:547)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@638f4d22
Looking at the code for hudson.plugins.sshslaves.SSHLauncher.java in
afterDisconnect I see no hint of code that deals with timeouts. Looking further
up the stack I wonder that happens when openSessionChannel tries to make a
connection to the slave but it dies on the other side. The code does not look
like it times out. If this is the case and whatever is on the other side of the
channel that is supposed to respond is also dead, it would seem to me that
waitUntilChannelOpen will never return and hang forever. Thus, the
hudson.plugins.sshslaves.SSHLauncher lock will never be released and other
threads wanting this lock will block forever. i.e. effective deadlock.
Can anyone confirm or refute my logic here? This certainly seems it could
explain my symptoms.
Kind regards.
Artur
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.