>
>
>>
>> From your reply, I am even more concerned with disproportionally high
>> number of the blocked threads (120) compare to offline slaves (2 at the
>> time), as it sounds like it should be closer to 1:1?
>>
>
> Yes, it sounds like there is a race condition between the post disconnect
> tasks and the reconnect tasks:
> https://github.com/jenkinsci/ssh-slaves-plugin/blob/ssh-slaves-1.6/src/main/java/hudson/plugins/sshslaves/SSHLauncher.java#L1152is
>  blocking until the slave is connected... but the slave cannot connect
> until the disconnect tasks are complete...
>
>
>>
>>
​Do you have 'dead' slaves, and what's your logging configuration like?

I'm tracking down a similar problem, in that our Jenkins instance (which
isn't that large) slows to the state of the UI timing out.

Taking occasional stack-dumps (this is an early guess, could be very wrong)
shows, basically, the UI waiting to get access to
java.util.logging.ConsoleHandler​.

e.g:

- waiting to lock <0x00000000804285c0> (a java.util.logging.ConsoleHandler)
        at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:105)
        at java.util.logging.Logger.log(Logger.java:565)
        at java.util.logging.Logger.doLog(Logger.java:586)
        at java.util.logging.Logger.logp(Logger.java:702)
        at org.apache.commons.logging.impl.Jdk14Logger.log(Jdk14Logger.java:87)
        at 
org.apache.commons.logging.impl.Jdk14Logger.trace(Jdk14Logger.java:239)
        at 
org.apache.commons.beanutils.BeanUtilsBean.copyProperty(BeanUtilsBean.java:372)
... etc etc down to the caller



​Now - the interesting thing is that that trace seems to be going through
apache logging, then JUL logging. But I get nothing on the console, so it's
either throwing an exception because of a misconfiguration, or it's
checking whether we actually wanted this output after acquiring the lock.

Either way, unsurprisingly I don't care about trace logs from apache
beanutils! ;-) I suspect someone may have adjusted our logging trying to
track something down.

Second interesting thing is I notice a lot of the time, the console is
being held by Computer.threadPoolForRemoting. E.g:


.... etc etc

at java.util.logging.StreamHandler.publish(StreamHandler.java:196)
        - locked <0x00000000804285c0> (a java.util.logging.ConsoleHandler)
        at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:105)
        at java.util.logging.Logger.log(Logger.java:565)
        at java.util.logging.Logger.doLog(Logger.java:586)
        at java.util.logging.Logger.log(Logger.java:675)
        at 
hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:285)
        at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
        at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
        at 
jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)


​Again, it's one of those pesky warnings that never actually ends up on the
console, but what it's doing is

LOGGER.log(Level.WARNING, "Failed to ack the stream",e);​


​It seems like it's running that a lot (which I suspected might be for
non-working slaves). I think it attempts to generate a stack trace, which
is expensive (and helpfully JUL does all that whilst holding onto the
console lock... >:-S ) - which may be ​why the responsiveness gets crushed.

​Anyway, HTMH and back to digging...
​

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to