I'm doing a heap-dump analysis now and I think I might know what the issue
was. The start of this whole problem was the disk-usage plugin hanging our
attempts to view a job in Jenkins (see
https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that
plugin. After disabling, Jenkins complained about data in an
older/unreadable format:
You have data stored in an older format and/or unreadable data.
If I click the "Manage" button to delete it, it takes a _long_ time for it
to display all the disk-usage plugin data - there must be thousands of
rows, but it does display it all eventually. The error shown in each row
is:
CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction
If I click "Discard Unreadable Data" at the bottom of the page, I quickly
get a stack trace:
javax.servlet.ServletException: java.util.ConcurrentModificationException
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239)
at
org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53)
at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587)
at org.kohsuke.stapler.Stapler.service(Stapler.java:218)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:45)
at winstone.ServletConfiguration.execute(ServletConfiguration.java:248)
at winstone.RequestDispatcher.forward(RequestDispatcher.java:333)
at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376)
at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203)
at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181)
at
net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86)
and it fails to discard the data. Older data isn't usually a problem so I
brushed off this error. However, here is dominator_tree of the heap dump:
Class Name
| Shallow Heap | Retained Heap
| Percentage
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0
| 24 | 3,278,466,984
| 88.69%
com.thoughtworks.xstream.converters.SingleValueConverterWrapper @
0x6f9da8780 | 16 |
13,825,616 | 0.37%
hudson.model.Hudson @ 0x6f9b8b8e8
| 272 | 3,572,400
| 0.10%
org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598
| 88 | 2,308,760
| 0.06%
org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710
| 32 | 1,842,392
| 0.05%
org.kohsuke.stapler.WebApp @ 0x6f9c0ff10
| 64 | 1,127,480
| 0.03%
java.lang.Thread @ 0x7d5c2d138 Handling GET
/view/Alle/job/common-translation-main/ : RequestHandlerThread[#105]
Thread| 112 | 971,336 | 0.03%
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
What is hudson.diagnosis.OldDataMonitor? Could the disk-usage plugin data
be the cause of all my recent OOM errors? If so, how do I get rid of it?
-tim
On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote:
>
> I intended to install 1.532 on Friday, but mistakenly installed 1.539. It
> gave us the same OOM exceptions. I'm installing 1.532 now and will -
> hopefully - know tomorrow whether it's stable or not. I'm not exactly sure
> what's going to happen with our plugins though. Hopefully Jenkins will
> tell me if they must be downgraded too.
>
> -tim
>
> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote:
>>
>> How does the current LTS (1.532.1) hold up?
>>
>>
>> On 6 December 2013 13:33, Tim Drury <[email protected]> wrote:
>>
>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're getting
>>> a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM is jdk
>>> -x64-1.6.0_26)
>>>
>>> At first I did the simplest thing and increased the heap from 3G to 4.2G
>>> (and bumped up permgen). This didn't help so I started looking at threads
>>> via the Jenkins monitoring tool. It indicated the disk-usage plugin was
>>> hung. When you tried to view a page for a particularly large job, the page
>>> would "hang" and the stack trace showed the disk-usage plugin was to blame
>>> (or so I thought). Jira report with thread dump here:
>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg>
>>>
>>> We disabled the disk-usage plugin and restarted and now we can visit
>>> that job page. However, we still get OOM and lots of GCs in the logs at
>>> least once a day. The stack trace looks frighteningly similar to that from
>>> the disk-usage plugin. Here is an edited stack trace showing the methods
>>> common between the two OOM incidents: one during the disk-usage plugin and
>>> one after it was disabled:
>>>
>>> [lots of xstream methods snipped]
>>> hudson.XmlFile.unmarshal(XmlFile.java:165)
>>> hudson.model.Run.reload(Run.java:323)
>>> hudson.model.Run.<init>(Run.java:312)
>>> hudson.model.AbstractBuild.<init>(AbstractBuild.java:185)
>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54)
>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146)
>>> ... [JVM methods snipped]
>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155)
>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342)
>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340)
>>> hudson.model.RunMap.retrieve(RunMap.java:225)
>>> hudson.model.RunMap.retrieve(RunMap.java:59)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:677)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:660)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(AbstractLazyLoadRunMap.java:502)
>>>
>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:536)
>>> hudson.model.AbstractProject.getBuildByNumber(AbstractProject.java:1077)
>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165)
>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273)
>>> hudson.model.Run.isKeepLog(Run.java:572)
>>> ...
>>>
>>> It seems something in "core" Jenkins has changed and not for the better.
>>> Anyone seeing these issues?
>>>
>>> -tim
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Jenkins Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
--
You received this message because you are subscribed to the Google Groups
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.