|
Since Jenkins core 1.612 Java 7 is required for core and agents. It may happen that in a migration a user forget to upgrade the JVM of an agent. It is not a supported but what is annoying is that it produces a important consumption of memory because the connection fails repeatedly with a Java Compatibility error which isn't correctly catched. The problem was originally
Here is the analyse done by stephenconnolly :
I suspect that the J6 may be causing other leaks as it is probably blowing up in unexpected places
May 31, 2016 2:21:13 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
INFO: Accepted connection #64 from /127.0.0.1:54507
May 31, 2016 2:21:13 PM hudson.TcpSlaveAgentListener$ConnectionHandler run
WARNING: Connection #64 failed
java.io.IOException: Remote call on jnlp failed
at hudson.remoting.Channel.call(Channel.java:789)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:508)
at jenkins.slaves.JnlpSlaveAgentProtocol$Handler.jnlpConnect(JnlpSlaveAgentProtocol.java:126)
at jenkins.slaves.DefaultJnlpSlaveReceiver.handle(DefaultJnlpSlaveReceiver.java:70)
at jenkins.slaves.JnlpSlaveAgentProtocol2$Handler2.run(JnlpSlaveAgentProtocol2.java:57)
at jenkins.slaves.JnlpSlaveAgentProtocol2.handle(JnlpSlaveAgentProtocol2.java:30)
at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:156)
Caused by: java.lang.ClassFormatError: Failed to load hudson.slaves.SlaveComputer$SlaveVersion
at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:340)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:251)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:114)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1591)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
at hudson.remoting.UserRequest.deserialize(UserRequest.java:184)
at hudson.remoting.UserRequest.perform(UserRequest.java:98)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at hudson.remoting.Engine$1$1.run(Engine.java:62)
at java.lang.Thread.run(Thread.java:695)
at ......remote call to jnlp(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:220)
at hudson.remoting.Channel.call(Channel.java:781)
... 6 more
Caused by: java.lang.UnsupportedClassVersionError: hudson/slaves/SlaveComputer$SlaveVersion : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:637)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.lang.ClassLoader.defineClass(ClassLoader.java:471)
at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:338)
at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:251)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at hudson.remoting.MultiClassLoaderSerializer$Input.resolveClass(MultiClassLoaderSerializer.java:114)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1591)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1496)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
at hudson.remoting.UserRequest.deserialize(UserRequest.java:184)
at hudson.remoting.UserRequest.perform(UserRequest.java:98)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at hudson.remoting.Engine$1$1.run(Engine.java:62)
at java.lang.Thread.run(Thread.java:695)
I think this is an issue in Jenkins Core, to whit:
All of this should be in a try ... catch block and we should probably close the channel if any of that fails.
Instead what is happening is that the channel remains semi-half-open:
-
The slave side thinks it is closed but the Jenkins side does not.
-
Because we have not set the slave's channel field, subsequent connection attempts will not be rejected due to an existing connection. In fact nothing is really retaining a reference to the channel, and we never got to set up the ping thread, so at best we are awaiting the OS to decide the socket is dead.
Using a `while true ; do java -jar slave.jar -noReconnect -jnlpUrl ... ; do` loop you can trigger the issue faster:
The memory will be reclaimed once the connection is old enough to have been deemed dead by the TCP stack, but I had one slave with at most one partially set-up connection and the Channel instances just keep on growing. Every so often you can get a few connections to drop off through a full GC, but there would still be loads still "live"
after a short while
![]()
after some more time
![]()
(next I let it run a little more then stopped the slave and triggered a full GC)
![]()
notice GC doesn't make much of a dent
1m40s later we were able to get GC to collect another instance, leaving loads still hanging around:
![]()
after another forced GC
The workaround is obviously not to have a J6 slave.
|