"hudson.remoting.ChannelClosedException: channel is already closed" indicates an unexpected loss of connection to the slave. The nested "Caused by: java.io.EOFException" indicates that the slave side has shut down the communication with the slave.

The thing is, the communication to the slave (InputStream that Channel reads) is tunneled over several layers, and the way this part of the code discovers the problem is by InputStream.read() returning -1.

This design of InputStream does not allow us to report the underlying cause of the communication problem through a chained exception, so we really can't properly report the root cause.

The slave console log does normally capture the last dying message from the slave JVM or a transport level errors, but this gets rotated quickly as soon as the next connection attempt starts, and while on $JENKINS_HOME this file is still available, there's no way to look at this from the web UI. Jenkins does pretty aggressively auto-reconnect slaves that fail, and it takes some time for someone to notice a build failure by ChannelClosedException and try to understand what's going on, so that makes the trouble-shooting even more tricky.

I was just sweeping the ssh-slaves plugin ticket backlog, and there are many reports of this same issue, so this clearly is a gap in the diagnosability of the slave connectivity.

If anyone has a good idea of how to capture the errors, that'd be greatly appreciated.


One approach that I think about is to introduce a proper log rotation mechanism (that handles LargeText.doProgressText() correctly), and somehow use that to let people scroll back the slave console log.

Perhaps another possibility is to let the ComputerLauncher record a connection loss as an Exception on a failing Channel.



On 04/17/2013 02:41 PM, hajush wrote:
The intermittent failure of slave jobs due to issue  12235
<https://issues.jenkins-ci.org/browse/JENKINS-12235>   looks like it might
start undoing progress in getting my work teams to adopt Jenkins.

Has anyone given any thought to the issue and how to address it? Some folks
had luck by increasing the ClientInterval on unix masters - but others did
not.

I see that late last month Kohsuke increased the pipe window size in
hudson.remoting.Channel - though I'm not sure that would address this - and
since it's intermittent - it's hard to test. Here's what our stack trace
failure looks like.

FATAL: Unable to delete script file c:\temp\hudson985794291407431615.bat
hudson.util.IOException2: remote file operation failed:
c:\temp\hudson985794291407431615.bat at
hudson.remoting.Channel@e553b0:vcvmwin061
        at hudson.FilePath.act(FilePath.java:848)
        at hudson.FilePath.act(FilePath.java:825)
        at hudson.FilePath.delete(FilePath.java:1202)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
        at
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:810)
        at hudson.model.Build$BuildExecution.build(Build.java:199)
        at hudson.model.Build$BuildExecution.doRun(Build.java:160)
        at
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:592)
        at hudson.model.Run.execute(Run.java:1543)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:236)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
        at hudson.remoting.Channel.send(Channel.java:494)
        at hudson.remoting.Request.call(Request.java:129)
        at hudson.remoting.Channel.call(Channel.java:672)
        at hudson.FilePath.act(FilePath.java:841)




--
View this message in context: 
http://jenkins.361315.n4.nabble.com/Any-ideas-how-to-fix-JENKINS-12235-tp4663279.html
Sent from the Jenkins dev mailing list archive at Nabble.com.



--
Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/
Try Nectar, our professional version of Jenkins

--
You received this message because you are subscribed to the Google Groups "Jenkins 
Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to