Hi,

I am trying to debug the following symptom:
Jenkins started a slave. The slave died (machine hung, never had a chance to 
communicate back to master). Jenkins tries to restart it, but is not able to. 
When trying to restart the slave manually nothing happens. The slave logs are 
and remain empty with the spinning icon just running.

I had a look at the thread dump and saw a number of threads blocked and waiting 
for the following thread:

"Computer.threadPoolForRemoting [#14515]" Id=577044 Group=main WAITING on 
com.trilead.ssh2.channel.Channel@76dd2191
        at java.lang.Object.wait(Native Method)
        -  waiting on com.trilead.ssh2.channel.Channel@76dd2191
        at java.lang.Object.wait(Object.java:503)
        at 
com.trilead.ssh2.channel.ChannelManager.waitUntilChannelOpen(ChannelManager.java:109)
        at 
com.trilead.ssh2.channel.ChannelManager.openSessionChannel(ChannelManager.java:583)
        at com.trilead.ssh2.Session.<init>(Session.java:41)
        at com.trilead.ssh2.Connection.openSession(Connection.java:1129)
        -  locked com.trilead.ssh2.Connection@1e553bf
        at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:99)
        at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
        at 
hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1160)
        -  locked hudson.plugins.sshslaves.SSHLauncher@3c884383
        at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:547)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

        Number of locked synchronizers = 1
        - java.util.concurrent.ThreadPoolExecutor$Worker@638f4d22

Looking at the code for hudson.plugins.sshslaves.SSHLauncher.java in 
afterDisconnect I see no hint of code that deals with timeouts. Looking further 
up the stack I wonder that happens when openSessionChannel tries to make a 
connection to the slave but it dies on the other side. The code does not look 
like it times out. If this is the case and whatever is on the other side of the 
channel that is supposed to respond is also dead, it would seem to me that 
waitUntilChannelOpen will never return and hang forever. Thus, the 
hudson.plugins.sshslaves.SSHLauncher lock will never be released and other 
threads wanting this lock will block forever. i.e. effective deadlock.

Can anyone confirm or refute my logic here? This certainly seems it could 
explain my symptoms.

Kind regards.

Artur

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to