Change By: Arcadiy Ivanov (19/Nov/14 11:58 PM)
Description: Whenever a remote connection to a NIO JNLP slave is abruptly terminated, Channel.terminate() not cleanup exported FastPipedOutputStream, which causes FastPipedInputStream to leak, which causes FastPipedInputStream to loop forever waiting for buffer to be filled.

Example:

{noformat}
"Channel reader thread: Channel to Maven [/home/ec2-user/devhome/current/jdk/bin/java, -Xmx1g, -XX:MaxPermSize=1g, -Djava.awt.headless=true, -cp, /home/ec2-user/maven31-agent.jar:/home/ec2-user/devhome/current/maven/boot/plexus-classworlds-2.5.1.jar:/home/ec2-user/devhome/current/maven/conf/logging, jenkins.maven3.agent.Maven31Main, /home/ec2-user/devhome/current/maven, /tmp/slave.jar, /home/ec2-user/maven31-interceptor.jar, /home/ec2-user/maven3-interceptor-commons.jar, 26853]" prio=10 tid=0x00002b143d5a6800 nid=0xf0cb in Object.wait() [0x00002b1434e0c000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <
0x000000070559dce8 0x0000000702df6a88 > (a [B)
        at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:175)
        - locked <
0x000000070559dce8 0x0000000702df6a88 > (a [B)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
        - locked <
0x000000070559c378 0x0000000702df5118 > (a java.io.BufferedInputStream)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
        at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

"Executor #0 for Primary Koji Slave Build Machine (i-26ed9ecc) : executing Koji - WildFly #13 / waiting for hudson.remoting.Channel@5f9da360:Channel to Maven [/home/ec2-user/devhome/current/jdk/bin/java, -Xmx1g, -XX:MaxPermSize=1g, -Djava.awt.headless=true, -cp, /home/ec2-user/maven31-agent.jar:/home/ec2-user/devhome/current/maven/boot/plexus-classworlds-2.5.1.jar:/home/ec2-user/devhome/current/maven/conf/logging, jenkins.maven3.agent.Maven31Main, /home/ec2-user/devhome/current/maven, /tmp/slave.jar, /home/ec2-user/maven31-interceptor.jar, /home/ec2-user/maven3-interceptor-commons.jar, 26853]" prio=10 tid=0x00002b144b97f000 nid=0xf0bf in Object.wait() [0x00002b1436624000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <
0x000000070559d660 0x0000000702df6400 > (a hudson.remoting.UserRequest)
        at hudson.remoting.Request.call(Request.java:146)
        - locked <
0x000000070559d660 0x0000000702df6400 > (a hudson.remoting.UserRequest)
        at hudson.remoting.Channel.call(Channel.java:751)
        at hudson.maven.ProcessCache$MavenProcess.call(ProcessCache.java:161)
        at hudson.maven.MavenModuleSetBuild$MavenModuleSetBuildExecution.doRun(MavenModuleSetBuild.java:840)
        at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:533)
        at hudson.model.Run.execute(Run.java:1745)
        at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:529)
        at hudson.model.ResourceController.execute(ResourceController.java:89)
        at hudson.model.Executor.run(Executor.java:240)
{noformat}

Slave computer log:
{noformat}
JNLP agent connected from /10.40.1.117
<===[JENKINS REMOTING CAPACITY]===>Slave.jar version: 2.47
This is a Unix slave
Slave successfully connected and online
JNLP agent connected
ERROR: Connection terminated
java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@f4c5cfc[name=Primary Koji Slave Build Machine (i-26ed9ecc)]
at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:211)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:631)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:136)
at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:306)
at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:564)
... 6 more
{noformat}

Discussion:

The problem, ultimately, lies in how the Channel is terminated.

# NioTransport.abort(e) calls ByteArrayReceiver.terminate(e)
# ByteArrayReceiver.terminate(e) calls CommandReceiver.terminate(e)
# CommandReceiver is constructed as anonymous type in Channel constructor in transportSetup and redirects the call to Channel.this.terminate(e); 
# Channel.this.terminate(e) closes does NOT cleanup any of the exported Streams in the ExportTable.
# Since channel doesn't clean up FastPipedOutputStream from the ExportTable the FastPipeInputStream.source() always returns a reference to FastPipedOutputStream and the pipe IN side of the pipe will wait on buffer forever.
# Since FastPipeInputStream.read waits for data in buffer forever, the SynchronousCommanTransport$ReaderThread never terminates.
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to