[JIRA] [core] (JENKINS-24155) Jenkins Slaves Go Offline In Large Quantities and Don't Reconnect Until Reboot

[email protected] (JIRA) Tue, 19 Aug 2014 09:07:44 -0700

I've seen the same issue on a Windows slave running a self-built version of 1.577-SNAPSHOT. The slave error log suggests that slave saw a connection reset but when it reconnected the master thought the slave was still connected and connection retries failed.

Aug 17, 2014 11:05:02 PM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: I/O error in channel channel
java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(Unknown Source)
	at java.net.SocketInputStream.read(Unknown Source)
	at java.io.BufferedInputStream.fill(Unknown Source)
	at java.io.BufferedInputStream.read(Unknown Source)
	at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
	at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
	at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
	at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)


Aug 17, 2014 11:05:02 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Aug 17, 2014 11:05:12 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect
INFO: Restarting slave via jenkins.slaves.restarter.WinswSlaveRestarter@5f849b
Aug 17, 2014 11:05:17 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: Cygnet
Aug 17, 2014 11:05:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 17, 2014 11:05:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.example/]
Aug 17, 2014 11:05:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.example:42715
Aug 17, 2014 11:05:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Aug 17, 2014 11:05:17 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: The server rejected the connection: Cygnet is already connected to this master. Rejecting this connection.
java.lang.Exception: The server rejected the connection: Cygnet is already connected to this master. Rejecting this connection.
	at hudson.remoting.Engine.onConnectionRejected(Engine.java:306)
	at hudson.remoting.Engine.run(Engine.java:276)

Aug 17, 2014 11:06:17 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: Cygnet
Aug 17, 2014 11:06:17 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 17, 2014 11:06:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.example/]
Aug 17, 2014 11:06:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.example:42715
Aug 17, 2014 11:06:17 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Aug 17, 2014 11:06:17 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: The server rejected the connection: Cygnet is already connected to this master. Rejecting this connection.
java.lang.Exception: The server rejected the connection: Cygnet is already connected to this master. Rejecting this connection.
	at hudson.remoting.Engine.onConnectionRejected(Engine.java:306)
	at hudson.remoting.Engine.run(Engine.java:276)

Aug 17, 2014 11:07:18 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up slave: Cygnet
Aug 17, 2014 11:07:18 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Aug 17, 2014 11:07:18 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins.example/]
Aug 17, 2014 11:07:18 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins.example:42715
Aug 17, 2014 11:07:18 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Aug 17, 2014 11:07:18 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: The server rejected the connection: Cygnet is already connected to this master. Rejecting this connection.
java.lang.Exception: The server rejected the connection: Cygnet is already connected to this master. Rejecting this connection.
	at hudson.remoting.Engine.onConnectionRejected(Engine.java:306)
	at hudson.remoting.Engine.run(Engine.java:276)

After 3 retries at restarting the windows service restarter gave up and unfortuntely I didn't attempt to reconnect until after I had restarted the master over 12 hours later.

The equivalent part of the master's logs are as follows (only the first restart included here but the others are equivalent).

Aug 17, 2014 11:05:18 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted connection #7 from /192.168.1.115:60293
Aug 17, 2014 11:05:18 PM jenkins.slaves.JnlpSlaveHandshake error
WARNING: TCP slave agent connection handler #7 with /192.168.1.115:60293 is aborted: Cygnet is already connected to this master. Rejecting this connection.
Aug 17, 2014 11:05:18 PM jenkins.slaves.JnlpSlaveHandshake error
WARNING: TCP slave agent connection handler #7 with /192.168.1.115:60293 is aborted: Unrecognized name: Cygnet
Aug 17, 2014 11:06:19 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted connection #8 from /192.168.1.115:60308

The master log does not show any evidence of the slave connection being broken in the 12 hours before the master restart.

If the master is not detecting that the connection has closed then it will still think that the slave is connected and will refuse re-connections as seen.

Sadly I didn't get a stacktrace of the master's threads to see if any threads were blocked anywhere.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

[JIRA] [core] (JENKINS-24155) Jenkins Slaves Go Offline In Large Quantities and Don't Reconnect Until Reboot

Reply via email to