Issue Type: Bug Bug
Assignee: Unassigned
Components: core
Created: 23/May/14 11:20 PM
Description:

We encountered today a situation where one of our slaves was totally locked.

  • Jobs would launch but get no futher than
Building remotely on XXX in workspace YYY
 Starting build job ZZZ

  • No apparent problematic entries in the master log
  • Status showed the slave as online
  • No apparent problematic entries in the slave log, entries just stopped at the time when the problem started

Taking a stack trace showed that all threads were stuck in the following stack frame (full stack trace attached)

"pool-1-thread-10786" prio=3 tid=0x08461800 nid=0x4e43 in Object.wait() [0xb5088000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0xbade43b0> (a hudson.remoting.PipeWindow$Real)
	at java.lang.Object.wait(Object.java:485)
	at hudson.remoting.PipeWindow$Real.get(PipeWindow.java:177)
	- locked <0xbade43b0> (a hudson.remoting.PipeWindow$Real)
	at hudson.remoting.ProxyOutputStream._write(ProxyOutputStream.java:118)
	- locked <0xbade43d8> (a hudson.remoting.ProxyOutputStream)
	at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:103)
	at hudson.Util.copyStream(Util.java:454)
	at hudson.FilePath$28.call(FilePath.java:1623)
	at hudson.FilePath$28.call(FilePath.java:1617)
	at hudson.remoting.UserRequest.perform(UserRequest.java:118)
	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
	at hudson.remoting.Request$2.run(Request.java:326)
	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at hudson.remoting.Engine$1$1.run(Engine.java:60)
	at java.lang.Thread.run(Unknown Source)

Looking at the code of PipeWindow$Real.get() it does not look totally impossible that threads get stuck in get() and never woken up if the pipe fills up. But I can't really point at a concrete problem.

I checked the issues and found JENKINS-9540 and JENKINS-22807, but those seem different, with particular messages in the logs.

Could this be a deadlock in the slave remoting code?

Environment: Version 1.532.1
Project: Jenkins
Priority: Major Major
Reporter: Joe Ammann
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to