Hi Oleg,


Thanks for the quick reply. Before I upgraded Jenkins this past weekend, I 
was seeing a very similar (but not identical) intermittent failure with 
Jenkins 2.89.2 LTS (Remoting 3.14) and Swarm Plugin client 3.6 (3.10.2).


My job would start, spin up the VM on AWS, and the swarm client would 
connect back to the master successfully, check out the Git repository, and 
run the tests (which take about 2-3 hours). But in the "publish results" 
stage after the tests, I would frequently see this on the master:


java.nio.channels.ClosedChannelException

at 
org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)

at 
org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)

at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721)

at 
jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)

Caused: hudson.remoting.ChannelClosedException: Remote call on 
JNLP4-connect connection from 10.110.xyz.xyz/10.110.xyz.xyz:61938 failed. 
The channel is closing down or has closed down

at hudson.remoting.Channel.call(Channel.java:901)

at hudson.FilePath.act(FilePath.java:986)

Caused: java.io.IOException: remote file operation failed: 
/export/home/path/to/ws at hudson.remoting.Channel@71abe1cc:JNLP4-connect 
connection from 10.110.xyz.xyz/10.110.xyz.xyz:61938

at hudson.FilePath.act(FilePath.java:993)

at hudson.FilePath.act(FilePath.java:975)

at hudson.FilePath.mkdirs(FilePath.java:1158)

at 
org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:79)

at 
org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:67)

at 
org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1$1.call(SynchronousNonBlockingStepExecution.java:49)

at hudson.security.ACL.impersonate(ACL.java:260)

at 
org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution$1.run(SynchronousNonBlockingStepExecution.java:46)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Finished: FAILURE


Unfortunately, I don't have the client side logs when running Swarm Plugin 
client 3.6, since this was before I enabled logging on the client side. I 
will downgrade the Swarm Plugin client from 3.8 (Remoting 3.16) to 3.6 
(Remoting 3.10.2) but keep the logging enabled and update this thread once 
I hit the problem again and collect those logs.


The really bizarre thing about the old failure mode was that the remote 
operation that failed was a FilePath.mkdirs call to the workspace itself, 
which we had been successfully using throughout the duration of the test 
run. There's no way it would have been inaccessible at the file system 
level on the swarm client at the end of the test run during the publish 
results stage (the workspace is on local disk in the VM); rather, it must 
be that the Remoting connection got itself into a pathological state during 
the test run and then failed explicitly at the final publish results stage.


With Swarm Plugin client 3.8 (Remoting 3.16), I see the failure at the very 
beginning of the test run, which in some ways is better (fail fast) but in 
other ways is worse (my tests don't get run).


If there’s anything else I can do to help debug, or a part of the source 
code that I should start reading first to understand what’s happening, then 
let me know.


Thanks,

Basil

On Friday, February 2, 2018 at 2:52:00 AM UTC-8, Oleg Nenashev wrote:
>
> Hard to say for sure.
> I see some suspicious reports in JIRA about Remoting 3.15/3.16. Would it 
> be possible to try Swarm Plugin client with older version to confirm 
> whether the issue existis there.
>
> Thanks in advance,
> Oleg
>
> пятница, 2 февраля 2018 г., 4:58:48 UTC+1 пользователь m...@basilcrow.com 
> написал:
>>
>> Hi all,
>>
>> I am trying to debug an intermittent failure with Remoting. I am familiar 
>> with the overall design of Remoting but not its implementation. I'm not 
>> sure if I should file a bug or not, so I am asking for help debugging here 
>> first. Please let me know if I should file a bug instead. Below I will 
>> present the data I have, and the questions I am trying to answer. I would 
>> appreciate any help debugging from someone who is familiar with the 
>> Remoting code.
>>
>> My Jenkins master is running Jenkins 2.89.3 LTS (Remoting 3.14). My 
>> Jenkins agent is connecting to the master via Swarm Plugin 3.8 (Remoting 
>> 3.16). Also note that the master is not in the same local network as the 
>> swarm client (the swarm client is running in AWS, the master is in a colo).
>>
>> The swarm client connects to the master successfully but then very 
>> quickly the job fails while trying to do a Git checkout:
>>
>> 15:31:47 Running on swarm-client-agent.example.com in 
>> /var/tmp/jenkins/workspace/path/to/workspace
>> [Pipeline] {
>> [Pipeline] ws
>> 15:31:47 Running in /export/home/path/to/ws
>> [Pipeline] {
>> [Pipeline] stage
>> [Pipeline] { (Check out repository)
>> [Pipeline] checkout
>> 15:32:42  > git rev-parse --is-inside-work-tree # timeout=10
>> [Pipeline] }
>> 15:36:35 ERROR: Execution failed
>> 15:36:35 java.nio.channels.ClosedChannelException
>> 15:36:35 Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call 
>> to JNLP4-connect connection from 10.110.xyz.xyz/10.110.xyz.xyz:44291
>> 15:36:35 at 
>> hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1693)
>> 15:36:35 at hudson.remoting.Request.call(Request.java:192)
>> 15:36:35 at hudson.remoting.Channel.call(Channel.java:907)
>> 15:36:35 at 
>> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:281)
>> 15:36:35 at com.sun.proxy.$Proxy92.hasGitRepo(Unknown Source)
>> 15:36:35 at 
>> org.jenkinsci.plugins.gitclient.RemoteGitImpl.hasGitRepo(RemoteGitImpl.java:280)
>> 15:36:35 at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1068)
>> 15:36:35 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1123)
>> 15:36:35 at 
>> org.jenkinsci.plugins.workflow.steps.scm.SCMStep.checkout(SCMStep.java:113)
>> 15:36:35 at 
>> org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:85)
>> 15:36:35 at 
>> org.jenkinsci.plugins.workflow.steps.scm.SCMStep$StepExecutionImpl.run(SCMStep.java:75)
>> 15:36:35 at 
>> org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
>> 15:36:35 at hudson.security.ACL.impersonate(ACL.java:260)
>> 15:36:35 at 
>> org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
>> 15:36:35 at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> 15:36:35 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> 15:36:35 Caused: hudson.remoting.RequestAbortedException
>> 15:36:35 at hudson.remoting.Request.abort(Request.java:329)
>> 15:36:35 at hudson.remoting.Channel.terminate(Channel.java:992)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:208)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:222)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:172)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:832)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:179)
>> 15:36:35 at 
>> org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:721)
>> 15:36:35 at 
>> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
>> 15:36:35 at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> 15:36:35 at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> 15:36:35 at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Since this was a swarm client and thus ephemeral, it was removed from the 
>> nodes view on the master and I have no access to any other logs on the 
>> master side.
>>
>> On the swarm client side, I have much more detailed logging. It looks 
>> like it definitely connected properly to the master (which makes sense, 
>> because the master entered the Git checkout stage):
>>
>> 2018-02-01 23:30:51 INFO hudson.plugins.swarm.Client main Client.main 
>> invoked with: [-name swarm-client-agent.example.com -fsroot 
>> /var/tmp/jenkins -master http://jenkins-master.example.com -username 
>> <redacted> -password <redacted> -executors 1 -description Dynamically 
>> configured Jenkins swarm plugin slave swarm-client-agent.example.com 
>> -mode exclusive -deleteExistingClients -disableClientsUniqueId 
>> -disableSslVerification]
>> 2018-02-01 23:30:52 CONFIG hudson.plugins.swarm.SwarmClient <init> 
>> SwarmClient constructed with hudson.plugins.swarm.Options@6f94fa3e
>> 2018-02-01 23:30:52 INFO hudson.plugins.swarm.Client run Discovering 
>> Jenkins master
>> 2018-02-01 23:30:52 CONFIG hudson.plugins.swarm.SwarmClient 
>> discoverFromMasterUrl discoverFromMasterUrl() invoked
>> 2018-02-01 23:30:52 CONFIG hudson.plugins.swarm.SwarmClient 
>> discoverFromMasterUrl Connecting to http://jenkins-master.example.com/ 
>> to configure swarm client.
>> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>> SLF4J: Defaulting to no-operation (NOP) logger implementation
>> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
>> further details.
>> 2018-02-01 23:30:55 INFO hudson.plugins.swarm.Client run Attempting to 
>> connect to http://jenkins-master.example.com/ 
>> 17e24806-9248-4b9b-86aa-ce97caa34001 with ID 
>> 2018-02-01 23:30:55 SEVERE hudson.plugins.swarm.SwarmClient getCsrfCrumb 
>> Could not obtain CSRF crumb. Response code: 404
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main createEngine Setting 
>> up agent: swarm-client-agent.example.com
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main$CuiListener <init> 
>> Jenkins agent is running in headless mode.
>> 2018-02-01 23:30:56 WARNING hudson.remoting.Engine startEngine No Working 
>> Directory. Using the legacy JAR Cache location: 
>> /var/tmp/jenkins/.jenkins/cache/jars
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Locating server among [http://jenkins-master.example.com/]
>> 2018-02-01 23:30:56 INFO 
>> org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve Remoting 
>> server accepts the following protocols: [JNLP4-connect, Ping]
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Agent discovery successful
>>   Agent address: jenkins-master.example.com
>>   Agent port:    53000
>>   Identity:      2e:a8:0f:2c:95:86:84:c4:85:ee:52:d2:38:7b:3f:b5
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Handshaking
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Connecting to jenkins-master.example.com:53000
>> 2018-02-01 23:30:56 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Trying protocol: JNLP4-connect
>> 2018-02-01 23:30:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Remote identity confirmed: 2e:a8:0f:2c:95:86:84:c4:85:ee:52:d2:38:7b:3f:b5
>> 2018-02-01 23:30:58 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Connected
>> 2018-02-01 23:31:46 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Main.development => null
>> 2018-02-01 23:31:46 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Main.timeout => null
>> 2018-02-01 23:31:56 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): 
>> hudson.ProxyConfiguration.DEFAULT_CONNECT_TIMEOUT_MILLIS => null
>> 2018-02-01 23:32:33 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.util.Secret.provider => null
>> 2018-02-01 23:33:32 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Util.noSymLink => null
>> 2018-02-01 23:33:32 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Util.symlinkEscapeHatch => null
>> 2018-02-01 23:33:32 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Util.maxFileDeletionRetries => null
>> 2018-02-01 23:33:32 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Util.deletionRetryWait => null
>> 2018-02-01 23:33:32 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Util.performGCOnFailedDelete => null
>>
>>
>> One minute later, disaster strikes:
>>
>> 2018-02-01 23:34:38 FINER 
>> org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run 
>> [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] Unexpected I/O exception
>> java.io.IOException: Connection timed out
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>> at hudson.remoting.SocketChannelStream$1.read(SocketChannelStream.java:35)
>> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
>> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
>> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
>> at 
>> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>> at 
>> org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:245)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at hudson.remoting.Engine$1$1.run(Engine.java:94)
>> at java.lang.Thread.run(Thread.java:748)
>> 2018-02-01 23:34:38 FINEST org.jenkinsci.remoting.protocol.NetworkLayer 
>> onRecvClosed [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] RECV Closed
>> 2018-02-01 23:34:38 FINEST org.jenkinsci.remoting.protocol.FilterLayer 
>> onRecvClosed [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] RECV Closed
>> 2018-02-01 23:34:38 FINE org.jenkinsci.remoting.protocol.ApplicationLayer 
>> onRecvClosed [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] RECV Closed
>> 2018-02-01 23:34:38 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Terminated
>> 2018-02-01 23:34:38 FINEST 
>> org.jenkinsci.remoting.protocol.IOHub$IOHubSelectorWatcher run Windows 
>> IOHub Watcher for IOHub#1: Selector[keys:0, gen:0] / pool-1-thread-2: 
>> Finished
>> 2018-02-01 23:34:38 FINE hudson.remoting.jnlp.Main main Engine has died
>> 2018-02-01 23:34:38 INFO hudson.plugins.swarm.Client run Retrying in 10 
>> seconds
>> 2018-02-01 23:34:38 WARNING hudson.remoting.UserRequest perform 
>> LinkageError while performing UserRequest:UserRPCRequest(27,hasGitRepo)
>> java.lang.NoClassDefFoundError: hudson/console/ConsoleAnnotationDescriptor
>> at hudson.util.StreamTaskListener._error(StreamTaskListener.java:166)
>> at hudson.util.StreamTaskListener.error(StreamTaskListener.java:175)
>> at 
>> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.hasGitRepo(CliGitAPIImpl.java:259)
>> at hudson.plugins.git.GitAPI.hasGitRepo(GitAPI.java:232)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at 
>> hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:922)
>> at 
>> hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:896)
>> at 
>> hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:853)
>> at hudson.remoting.UserRequest.perform(UserRequest.java:210)
>> at hudson.remoting.UserRequest.perform(UserRequest.java:53)
>> at hudson.remoting.Request$2.run(Request.java:358)
>> at 
>> hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at hudson.remoting.Engine$1$1.run(Engine.java:94)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.ClassNotFoundException: 
>> hudson.console.ConsoleAnnotationDescriptor
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:171)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 20 more
>> 2018-02-01 23:34:38 FINE hudson.slaves.ChannelPinger$2 onClosed 
>> Terminating ping thread for JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000
>> 2018-02-01 23:34:38 FINE hudson.remoting.PingThread run Ping thread for 
>> channel hudson.remoting.Channel@5cc1fe53:JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000 is interrupted. 
>> Terminating
>> 2018-02-01 23:34:38 FINE org.jenkinsci.remoting.protocol.ApplicationLayer 
>> doCloseWrite [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] Closing SEND
>> 2018-02-01 23:34:38 FINEST 
>> org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer processWrite 
>> [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] APP ENCODE: 0 bytes
>> 2018-02-01 23:34:38 WARNING hudson.remoting.Request$2 run Failed to send 
>> back a reply to the request hudson.remoting.Request$2@2d486301
>> hudson.remoting.ChannelClosedException: Channel 
>> "hudson.remoting.Channel@5cc1fe53:JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000": channel is already 
>> closed
>> at hudson.remoting.Channel.send(Channel.java:672)
>> at hudson.remoting.Request$2.run(Request.java:371)
>> at 
>> hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at hudson.remoting.Engine$1$1.run(Engine.java:94)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.nio.channels.ClosedChannelException
>> at 
>> org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:154)
>> at 
>> org.jenkinsci.remoting.protocol.impl.BIONetworkLayer.access$1800(BIONetworkLayer.java:48)
>> at 
>> org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader.run(BIONetworkLayer.java:264)
>> ... 4 more
>> 2018-02-01 23:34:38 FINEST 
>> org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer processWrite 
>> [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] Handshake status: 
>> NOT_HANDSHAKING engine result: Status = CLOSED HandshakeStatus = 
>> NOT_HANDSHAKING
>> bytesConsumed = 0 bytesProduced = 85
>> 2018-02-01 23:34:38 FINEST org.jenkinsci.remoting.protocol.FilterLayer 
>> onRecvClosed [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] RECV Closed
>> 2018-02-01 23:34:38 FINEST org.jenkinsci.remoting.protocol.FilterLayer 
>> doCloseSend [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] Closing SEND
>> 2018-02-01 23:34:38 FINEST 
>> org.jenkinsci.remoting.protocol.impl.BIONetworkLayer doCloseSend 
>> [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] Closing SEND
>> 2018-02-01 23:34:38 FINEST 
>> org.jenkinsci.remoting.protocol.impl.BIONetworkLayer$Reader run 
>> [JNLP4-connect connection to 
>> jenkins-master.example.com/172.16.106.38:53000] Reader thread stopped
>> 2018-02-01 23:34:48 CONFIG hudson.plugins.swarm.SwarmClient 
>> discoverFromMasterUrl discoverFromMasterUrl() invoked
>> 2018-02-01 23:34:48 CONFIG hudson.plugins.swarm.SwarmClient 
>> discoverFromMasterUrl Connecting to http://jenkins-master.example.com/ 
>> to configure swarm client.
>>
>>
>> Two minutes later, the swarm client manages to reconnect back to the 
>> master, but it is too late, because the job has already failed:
>>
>> 2018-02-01 23:34:48 FINE hudson.plugins.swarm.SwarmClient 
>> createHttpClient createHttpClient() invoked
>> 2018-02-01 23:34:48 FINE hudson.plugins.swarm.SwarmClient 
>> createHttpClient Setting HttpClient credentials based on options passed
>> 2018-02-01 23:36:47 FINER hudson.plugins.swarm.SwarmClient 
>> getChildElementString getChildElementString() invoked
>> 2018-02-01 23:36:47 FINE hudson.plugins.swarm.Candidate <init> Candidate 
>> constructed with url: http://jenkins-master.example.com/, secret: 
>> 17e24806-9248-4b9b-86aa-ce97caa34001
>> 2018-02-01 23:36:47 INFO hudson.plugins.swarm.Client run Attempting to 
>> connect to http://jenkins-master.example.com/ 
>> 17e24806-9248-4b9b-86aa-ce97caa34001 with ID 
>> 2018-02-01 23:36:51 SEVERE hudson.plugins.swarm.SwarmClient getCsrfCrumb 
>> Could not obtain CSRF crumb. Response code: 404
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main createEngine Setting 
>> up agent: swarm-client-agent.example.com
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener <init> 
>> Jenkins agent is running in headless mode.
>> 2018-02-01 23:36:57 WARNING hudson.remoting.Engine startEngine No Working 
>> Directory. Using the legacy JAR Cache location: 
>> /var/tmp/jenkins/.jenkins/cache/jars
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Locating server among [http://jenkins-master.example.com/]
>> 2018-02-01 23:36:57 INFO 
>> org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve Remoting 
>> server accepts the following protocols: [JNLP4-connect, Ping]
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Agent discovery successful
>>   Agent address: jenkins-master.example.com
>>   Agent port:    53000
>>   Identity:      2e:a8:0f:2c:95:86:84:c4:85:ee:52:d2:38:7b:3f:b5
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Handshaking
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Connecting to jenkins-master.example.com:53000
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Trying protocol: JNLP4-connect
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Remote identity confirmed: 2e:a8:0f:2c:95:86:84:c4:85:ee:52:d2:38:7b:3f:b5
>> 2018-02-01 23:36:57 INFO hudson.remoting.jnlp.Main$CuiListener status 
>> Connected
>> 2018-02-01 23:37:06 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Main.development => null
>> 2018-02-01 23:37:06 CONFIG jenkins.util.SystemProperties getString 
>> Property (not found): hudson.Main.timeout => null
>>
>>
>> I also have FINE and FINEST level logs for the rest of the time on the 
>> swarm client side, but I didn’t attach them to this email because they 
>> don’t seem relevant.
>>
>> My initial questions:
>>
>>    1. From a timing perspective, the first error is at 2018-02-01 
>>    23:34:38 when the swarm client gets an "Unexpected I/O exception 
>>    java.io.IOException: Connection timed out" when reading from the master. 
>> I 
>>    then see a bunch of "onRecvClosed" log statements on the swarm client 
>>    followed by "Terminated" and then attempts to reconnect to the master. 
>> The 
>>    swarm client is in AWS and the master is in a local colo, so it doesn't 
>>    surprise me that network connections occasionally time out. But why does 
>>    the master kill the job with java.nio.channels.ClosedChannelException 
>>    rather than waiting for the client to reconnect? The swarm client did in 
>>    fact reconnect successfully (its logs show this), but by then the job had 
>>    died on the master.
>>    2. Is the "hudson.remoting.UserRequest perform LinkageError while 
>>    performing UserRequest:UserRPCRequest(27,hasGitRepo) 
>>    java.lang.NoClassDefFoundError: 
>> hudson/console/ConsoleAnnotationDescriptor" 
>>    a red herring or a relevant error? Should I be exploring this further? As 
>>    far as I know, I have all relevant plugins installed and up-to-date.
>>    3. Who closed the connection first here, the client or the Jenkins 
>>    master? If the problem is indeed being exposed by my network, where 
>> should 
>>    I begin looking for dropped packets or congestion?
>>
>> Thanks in advance for any help,
>> Basil
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/f4eb3fc2-2a6c-4a8a-8415-a816843cd3df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to