I've just cracked out MAT on a oom dump from our machine, and I can confirm
that it looks like OldDataMonitor is the culprit here, too (750Mb of
retained heap).

There's over a million entries in the hashmap...




On Mon, Dec 9, 2013 at 4:32 PM, Tim Drury <[email protected]> wrote:

> I'm doing a heap-dump analysis now and I think I might know what the issue
> was.  The start of this whole problem was the disk-usage plugin hanging our
> attempts to view a job in Jenkins (see
> https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that
> plugin.  After disabling, Jenkins complained about data in an
> older/unreadable format:
>
> You have data stored in an older format and/or unreadable data.
>
> If I click the "Manage" button to delete it, it takes a _long_ time for it
> to display all the disk-usage plugin data - there must be thousands of
> rows, but it does display it all eventually.  The error shown in each row
> is:
>
> CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction
>
> If I click "Discard Unreadable Data" at the bottom of the page, I quickly
> get a stack trace:
>
> javax.servlet.ServletException: java.util.ConcurrentModificationException
> at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735)
> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
> at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239)
> at
> org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53)
> at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685)
> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587)
> at org.kohsuke.stapler.Stapler.service(Stapler.java:218)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:45)
> at winstone.ServletConfiguration.execute(ServletConfiguration.java:248)
> at winstone.RequestDispatcher.forward(RequestDispatcher.java:333)
> at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376)
> at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96)
> at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203)
> at net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181)
> at
> net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86)
>
> and it fails to discard the data.  Older data isn't usually a problem so I
> brushed off this error.  However, here is dominator_tree of the heap dump:
>
> Class Name
>                                              | Shallow Heap | Retained Heap
> | Percentage
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
> hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0
>                                               |           24 |
> 3,278,466,984 |     88.69%
> com.thoughtworks.xstream.converters.SingleValueConverterWrapper @
> 0x6f9da8780                                           |           16 |
>  13,825,616 |      0.37%
> hudson.model.Hudson @ 0x6f9b8b8e8
>                                               |          272 |
> 3,572,400 |      0.10%
> org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598
>                                              |           88 |     2,308,760
> |      0.06%
> org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710
>                                               |           32 |
> 1,842,392 |      0.05%
> org.kohsuke.stapler.WebApp @ 0x6f9c0ff10
>                                              |           64 |     1,127,480
> |      0.03%
> java.lang.Thread @ 0x7d5c2d138  Handling GET
> /view/Alle/job/common-translation-main/ : RequestHandlerThread[#105]
> Thread|          112 |       971,336 |      0.03%
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> What is hudson.diagnosis.OldDataMonitor?  Could the disk-usage plugin data
> be the cause of all my recent OOM errors?  If so, how do I get rid of it?
>
> -tim
>
>
> On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote:
>>
>> I intended to install 1.532 on Friday, but mistakenly installed 1.539.
>>  It gave us the same OOM exceptions.  I'm installing 1.532 now and will -
>> hopefully - know tomorrow whether it's stable or not.  I'm not exactly sure
>> what's going to happen with our plugins though.  Hopefully Jenkins will
>> tell me if they must be downgraded too.
>>
>> -tim
>>
>> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote:
>>>
>>> How does the current LTS (1.532.1) hold up?
>>>
>>>
>>> On 6 December 2013 13:33, Tim Drury <[email protected]> wrote:
>>>
>>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're getting
>>>> a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM is jdk
>>>> -x64-1.6.0_26)
>>>>
>>>> At first I did the simplest thing and increased the heap from 3G to
>>>> 4.2G (and bumped up permgen).  This didn't help so I started looking at
>>>> threads via the Jenkins monitoring tool.  It indicated the disk-usage
>>>> plugin was hung.  When you tried to view a page for a particularly large
>>>> job, the page would "hang" and the stack trace showed the disk-usage plugin
>>>> was to blame (or so I thought).  Jira report with thread dump here:
>>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg>
>>>>
>>>> We disabled the disk-usage plugin and restarted and now we can visit
>>>> that job page.  However, we still get OOM and lots of GCs in the logs at
>>>> least once a day.  The stack trace looks frighteningly similar to that from
>>>> the disk-usage plugin.  Here is an edited stack trace showing the methods
>>>> common between the two OOM incidents: one during the disk-usage plugin and
>>>> one after it was disabled:
>>>>
>>>> [lots of xstream methods snipped]
>>>> hudson.XmlFile.unmarshal(XmlFile.java:165)
>>>> hudson.model.Run.reload(Run.java:323)
>>>> hudson.model.Run.<init>(Run.java:312)
>>>> hudson.model.AbstractBuild.<init>(AbstractBuild.java:185)
>>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54)
>>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146)
>>>> ... [JVM methods snipped]
>>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155)
>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342)
>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340)
>>>> hudson.model.RunMap.retrieve(RunMap.java:225)
>>>> hudson.model.RunMap.retrieve(RunMap.java:59)
>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>>> AbstractLazyLoadRunMap.java:677)
>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>>> AbstractLazyLoadRunMap.java:660)
>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(
>>>> AbstractLazyLoadRunMap.java:502)
>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(
>>>> AbstractLazyLoadRunMap.java:536)
>>>> hudson.model.AbstractProject.getBuildByNumber(
>>>> AbstractProject.java:1077)
>>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165)
>>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273)
>>>> hudson.model.Run.isKeepLog(Run.java:572)
>>>> ...
>>>>
>>>> It seems something in "core" Jenkins has changed and not for the
>>>> better.  Anyone seeing these issues?
>>>>
>>>> -tim
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Jenkins Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to