Clark Wright created JENKINS-13330:
--------------------------------------

             Summary: Jenkins slave hangs in post build phase
                 Key: JENKINS-13330
                 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330
             Project: Jenkins
          Issue Type: Bug
          Components: master-slave, slave-status
         Environment: RHEL 5, both master and all slaves.
Jenkins is running inside of Tomcat
            Reporter: Clark Wright
            Priority: Critical
         Attachments: Screenshot-galleon_allIntegration #1196 Console [Jenkins] 
- Mozilla Firefox.png

We have an intermittent problem with slaves hanging AFTER the job itself is 
finished. In the post processing step (?) what we see is that the console log 
has this line:

Description set: vap_current_iter-2012_03_29_19_01_03

And then nothing. Usually, it will look like this:

Description set: prod_pull-2012_03_28_19_01_03
Notifying upstream build armada_Launch_prod_pull #13 of job completion
Project armada_Launch_prod_pull still waiting for 1 builds to complete
Notifying upstream projects of job completion
Notifying upstream of completion: armada_Launch_prod_pull #13
Finished: SUCCESS

I setup a logger for hudson.model.Run, and it currently has this :

    at java.lang.Thread.run(Thread.java:619)

Mar 30, 2012 12:44:00 PM hudson.model.Run run
INFO: galleon_allUnit #1134 main build action completed: SUCCESS
Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
FINE: galleon_allUnit #1134 : result is set to SUCCESS
java.lang.Exception
    at hudson.model.Run.setResult(Run.java:352)
    at hudson.model.Run.run(Run.java:1410)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
    at hudson.model.ResourceController.execute(ResourceController.java:88)
    at hudson.model.Executor.run(Executor.java:238)

Repeated for every hung slave.

The main hudson log doesn't have any additional information.

Disconnecting the slave has no effect.

Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually 
appears to hang on shutdown).

The only way we have found to recover is to kill -9 the tomcat process.

The tread dump for one of the slaves (they are all the same) is:

Thread Dump
Channel reader thread: channel

"Channel reader thread: channel" Id=9 Group=main RUNNABLE (in native)
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:199)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    -  locked java.io.BufferedInputStream@1ae615a
    at 
java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
    at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
    at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
    at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)


main

"main" Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
    at java.lang.Object.wait(Native Method)
    -  waiting on hudson.remoting.Channel@e1d5ea
    at java.lang.Object.wait(Object.java:485)
    at hudson.remoting.Channel.join(Channel.java:766)
    at hudson.remoting.Launcher.main(Launcher.java:420)
    at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
    at hudson.remoting.Launcher.run(Launcher.java:206)
    at hudson.remoting.Launcher.main(Launcher.java:168)


Ping thread for channel hudson.remoting.Channel@e1d5ea:channel

"Ping thread for channel hudson.remoting.Channel@e1d5ea:channel" Id=10 
Group=main TIMED_WAITING
    at java.lang.Thread.sleep(Native Method)
    at hudson.remoting.PingThread.run(PingThread.java:86)


Pipe writer thread: channel

"Pipe writer thread: channel" Id=12 Group=main WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
    at sun.misc.Unsafe.park(Native Method)
    -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
    at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
    at java.lang.Thread.run(Thread.java:619)


pool-1-thread-267

"pool-1-thread-267" Id=285 Group=main RUNNABLE
    at sun.management.ThreadImpl.dumpThreads0(Native Method)
    at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
    at hudson.Functions.getThreadInfos(Functions.java:872)
    at 
hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93)
    at 
hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89)
    at hudson.remoting.UserRequest.perform(UserRequest.java:118)
    at hudson.remoting.UserRequest.perform(UserRequest.java:48)
    at hudson.remoting.Request$2.run(Request.java:287)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

    Number of locked synchronizers = 1
    - java.util.concurrent.locks.ReentrantLock$NonfairSync@1186f88


Finalizer

"Finalizer" Id=3 Group=system WAITING on 
java.lang.ref.ReferenceQueue$Lock@1798fdd
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.ref.ReferenceQueue$Lock@1798fdd
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)


Reference Handler

"Reference Handler" Id=2 Group=system WAITING on 
java.lang.ref.Reference$Lock@1d40442
    at java.lang.Object.wait(Native Method)
    -  waiting on java.lang.ref.Reference$Lock@1d40442
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)


Signal Dispatcher

"Signal Dispatcher" Id=4 Group=system RUNNABLE

Any ideas on how to better recover or prevent this would be greatly appreciated.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.jenkins-ci.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to