[
https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=163390#comment-163390
]
Kohsuke Kawaguchi commented on JENKINS-13330:
---------------------------------------------
#2284 is waiting for the shell script build step to complete. Later builds are
past that point, but those are blocked at the point of e-mail notification,
waiting for the previous build to complete (or else it won't be able to decide
what to send out in the e-mail.)
So at this point, my reading of this is that this is not a bug in Jenkins but
simply a hang in the user script, but if you suspect otherwise, please wait for
the next hang to occur, then obtain the thread dump both on the master and the
slave, and check what the first blocked build is doing.
As I've explained above, concurrent executions of the same job can block if the
earlier build is blocked, so we need to focus on the root cause, which is why
the first of the blocked builds is blocking.
> Jenkins slave hangs in post build phase
> ---------------------------------------
>
> Key: JENKINS-13330
> URL: https://issues.jenkins-ci.org/browse/JENKINS-13330
> Project: Jenkins
> Issue Type: Bug
> Components: master-slave, slave-status
> Environment: RHEL 5, both master and all slaves.
> Jenkins is running inside of Tomcat
> Reporter: Clark Wright
> Priority: Critical
> Attachments: jenkins-stall-threaddump.gz,
> jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console
> [Jenkins] - Mozilla Firefox.png
>
>
> We have an intermittent problem with slaves hanging AFTER the job itself is
> finished. In the post processing step (?) what we see is that the console log
> has this line:
> Description set: vap_current_iter-2012_03_29_19_01_03
> And then nothing. Usually, it will look like this:
> Description set: prod_pull-2012_03_28_19_01_03
> Notifying upstream build armada_Launch_prod_pull #13 of job completion
> Project armada_Launch_prod_pull still waiting for 1 builds to complete
> Notifying upstream projects of job completion
> Notifying upstream of completion: armada_Launch_prod_pull #13
> Finished: SUCCESS
> I setup a logger for hudson.model.Run, and it currently has this :
> {noformat}
> at java.lang.Thread.run(Thread.java:619)
> Mar 30, 2012 12:44:00 PM hudson.model.Run run
> INFO: galleon_allUnit #1134 main build action completed: SUCCESS
> Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
> FINE: galleon_allUnit #1134 : result is set to SUCCESS
> java.lang.Exception
> at hudson.model.Run.setResult(Run.java:352)
> at hudson.model.Run.run(Run.java:1410)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
> at hudson.model.ResourceController.execute(ResourceController.java:88)
> at hudson.model.Executor.run(Executor.java:238)
> {noformat}
> Repeated for every hung slave.
> The main hudson log doesn't have any additional information.
> Disconnecting the slave has no effect.
> Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually
> appears to hang on shutdown).
> The only way we have found to recover is to kill -9 the tomcat process.
> The tread dump for one of the slaves (they are all the same) is:
> {noformat}
> Thread Dump
> Channel reader thread: channel
> "Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native)
> at java.io.FileInputStream.readBytes(Native Method)
> at java.io.FileInputStream.read(FileInputStream.java:199)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> - locked java.io.BufferedInputStream@1ae615a
> at
> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
> at
> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
> at
> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
> main
> "main" Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
> at java.lang.Object.wait(Native Method)
> - waiting on hudson.remoting.Channel@e1d5ea
> at java.lang.Object.wait(Object.java:485)
> at hudson.remoting.Channel.join(Channel.java:766)
> at hudson.remoting.Launcher.main(Launcher.java:420)
> at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
> at hudson.remoting.Launcher.run(Launcher.java:206)
> at hudson.remoting.Launcher.main(Launcher.java:168)
> Ping thread for channel hudson.remoting.Channel@e1d5ea:channel
> "Ping thread for channel hudson.remoting.Channel@e1d5ea:channel" Id=10
> Group=main TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at hudson.remoting.PingThread.run(PingThread.java:86)
> Pipe writer thread: channel
> "Pipe writer thread: channel" Id=12 Group=main WAITING on
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
> at sun.misc.Unsafe.park(Native Method)
> - waiting on
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> pool-1-thread-267
> "pool-1-thread-267" Id=285 Group=main RUNNABLE
> at sun.management.ThreadImpl.dumpThreads0(Native Method)
> at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
> at hudson.Functions.getThreadInfos(Functions.java:872)
> at
> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93)
> at
> hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89)
> at hudson.remoting.UserRequest.perform(UserRequest.java:118)
> at hudson.remoting.UserRequest.perform(UserRequest.java:48)
> at hudson.remoting.Request$2.run(Request.java:287)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Number of locked synchronizers = 1
> - java.util.concurrent.locks.ReentrantLock$NonfairSync@1186f88
> Finalizer
> "Finalizer" Id=3 Group=system WAITING on
> java.lang.ref.ReferenceQueue$Lock@1798fdd
> at java.lang.Object.wait(Native Method)
> - waiting on java.lang.ref.ReferenceQueue$Lock@1798fdd
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> Reference Handler
> "Reference Handler" Id=2 Group=system WAITING on
> java.lang.ref.Reference$Lock@1d40442
> at java.lang.Object.wait(Native Method)
> - waiting on java.lang.ref.Reference$Lock@1d40442
> at java.lang.Object.wait(Object.java:485)
> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> Signal Dispatcher
> "Signal Dispatcher" Id=4 Group=system RUNNABLE
> {noformat}
> Any ideas on how to better recover or prevent this would be greatly
> appreciated.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira