For us, I think we actually have several things going on -
- something is taking far more memory - bumping up Xmx significantly seemed
to help, though ..
- this was then causing the host (smartos / solaris) to run out of swap,
because of fork(). So I had to significantly raise tmpfs too.
- we habitually run under screen, so people can log in and re-attach to the
session to check what's going on. That seems b0rked (god alone knows why,
it's been fine for years) as multiple threads lock up waiting for this
monitor:
"Computer.threadPoolForRemoting [#51] : IO ID=7610 : seq#=7609" daemon
prio=3 tid=0x00000000037bb800 nid=0x242 runnable [0xfffffd7fec2fd000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:318)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
- locked <0x00000000a2692488> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:480)
- locked <0x00000000a2692468> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
- locked <0x00000000a26a1cf8> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.util.logging.StreamHandler.flush(StreamHandler.java:242)
* - locked <0x00000000a26a1c88> (a java.util.logging.ConsoleHandler)
<-- all web UI hangs waiting on this.*
at java.util.logging.ConsoleHandler.publish(ConsoleHandler.java:106)
at java.util.logging.Logger.log(Logger.java:565)
at java.util.logging.Logger.doLog(Logger.java:586)
at java.util.logging.Logger.log(Logger.java:675)
at
hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:271)
at hudson.remoting.PipeWriter$1.run(PipeWriter.java:158)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:111)
at
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Not running under screen seemed to solve that issue.
Sadly I can't seem to persuade JMX to work properly, so it's a bit of a
black box at the moment, though I have a couple of heap dumps I'll walk
when I have the time.
On Mon, Dec 9, 2013 at 2:41 PM, Tim Drury <[email protected]> wrote:
> I intended to install 1.532 on Friday, but mistakenly installed 1.539. It
> gave us the same OOM exceptions. I'm installing 1.532 now and will -
> hopefully - know tomorrow whether it's stable or not. I'm not exactly sure
> what's going to happen with our plugins though. Hopefully Jenkins will
> tell me if they must be downgraded too.
>
> -tim
>
>
> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote:
>
>> How does the current LTS (1.532.1) hold up?
>>
>>
>> On 6 December 2013 13:33, Tim Drury <[email protected]> wrote:
>>
>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're getting
>>> a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM is jdk
>>> -x64-1.6.0_26)
>>>
>>> At first I did the simplest thing and increased the heap from 3G to 4.2G
>>> (and bumped up permgen). This didn't help so I started looking at threads
>>> via the Jenkins monitoring tool. It indicated the disk-usage plugin was
>>> hung. When you tried to view a page for a particularly large job, the page
>>> would "hang" and the stack trace showed the disk-usage plugin was to blame
>>> (or so I thought). Jira report with thread dump here:
>>> https://issues.jenkins-ci.org/browse/JENKINS-20876
>>>
>>> We disabled the disk-usage plugin and restarted and now we can visit
>>> that job page. However, we still get OOM and lots of GCs in the logs at
>>> least once a day. The stack trace looks frighteningly similar to that from
>>> the disk-usage plugin. Here is an edited stack trace showing the methods
>>> common between the two OOM incidents: one during the disk-usage plugin and
>>> one after it was disabled:
>>>
>>> [lots of xstream methods snipped]
>>> hudson.XmlFile.unmarshal(XmlFile.java:165)
>>> hudson.model.Run.reload(Run.java:323)
>>> hudson.model.Run.<init>(Run.java:312)
>>> hudson.model.AbstractBuild.<init>(AbstractBuild.java:185)
>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54)
>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146)
>>> ... [JVM methods snipped]
>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155)
>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342)
>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340)
>>> hudson.model.RunMap.retrieve(RunMap.java:225)
>>> hudson.model.RunMap.retrieve(RunMap.java:59)
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>> AbstractLazyLoadRunMap.java:677)
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>> AbstractLazyLoadRunMap.java:660)
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(
>>> AbstractLazyLoadRunMap.java:502)
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(
>>> AbstractLazyLoadRunMap.java:536)
>>> hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:1077)
>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165)
>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273)
>>> hudson.model.Run.isKeepLog(Run.java:572)
>>> ...
>>>
>>> It seems something in "core" Jenkins has changed and not for the better.
>>> Anyone seeing these issues?
>>>
>>> -tim
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Jenkins Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
--
You received this message because you are subscribed to the Google Groups
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.