Re: Getting OutOfMemoryError (OOM) constantly in latest update to 1.542.

Nigel Magnay Thu, 12 Dec 2013 02:27:27 -0800

So this is what is happening for us :

- The build-usage plugin was displaying the problems at the beginning of
the thread, so we disabled it.
- Now, every build that we do, and every sub-project fills up the 'Old
data' log, with hundreds of
CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction


even though that plugin is not used in that build, and does not exist any
more.

After a modest number of builds (say, 1/2 a day or so), Jenkins bombs with
OOM as this log is filled with *millions* of entries, and it's game over.

Is there a way to disable this functionality? I can't see the utility of
it, and it's making the system totally unusable.



On Wed, Dec 11, 2013 at 5:55 PM, Nigel Magnay <[email protected]>wrote:

> I've just cracked out MAT on a oom dump from our machine, and I can
> confirm that it looks like OldDataMonitor is the culprit here, too (750Mb
> of retained heap).
>
> There's over a million entries in the hashmap...
>
>
>
>
> On Mon, Dec 9, 2013 at 4:32 PM, Tim Drury <[email protected]> wrote:
>
>> I'm doing a heap-dump analysis now and I think I might know what the
>> issue was.  The start of this whole problem was the disk-usage plugin
>> hanging our attempts to view a job in Jenkins (see
>> https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that
>> plugin.  After disabling, Jenkins complained about data in an
>> older/unreadable format:
>>
>> You have data stored in an older format and/or unreadable data.
>>
>> If I click the "Manage" button to delete it, it takes a _long_ time for
>> it to display all the disk-usage plugin data - there must be thousands of
>> rows, but it does display it all eventually.  The error shown in each row
>> is:
>>
>> CannotResolveClassException:
>> hudson.plugins.disk_usage.BuildDiskUsageAction
>>
>> If I click "Discard Unreadable Data" at the bottom of the page, I quickly
>> get a stack trace:
>>
>> javax.servlet.ServletException: java.util.ConcurrentModificationException
>>  at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735)
>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
>>  at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239)
>> at
>> org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53)
>>  at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685)
>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
>>  at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587)
>> at org.kohsuke.stapler.Stapler.service(Stapler.java:218)
>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:45)
>> at winstone.ServletConfiguration.execute(ServletConfiguration.java:248)
>>  at winstone.RequestDispatcher.forward(RequestDispatcher.java:333)
>> at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376)
>>  at
>> hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96)
>> at
>> net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203)
>>  at
>> net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181)
>> at
>> net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86)
>>
>> and it fails to discard the data.  Older data isn't usually a problem so
>> I brushed off this error.  However, here is dominator_tree of the heap dump:
>>
>> Class Name
>>                                                | Shallow Heap | Retained
>> Heap | Percentage
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0
>>                                               |           24 |
>> 3,278,466,984 |     88.69%
>> com.thoughtworks.xstream.converters.SingleValueConverterWrapper @
>> 0x6f9da8780                                           |           16 |
>>  13,825,616 |      0.37%
>> hudson.model.Hudson @ 0x6f9b8b8e8
>>                                               |          272 |
>> 3,572,400 |      0.10%
>> org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598
>>                                                |           88 |
>> 2,308,760 |      0.06%
>> org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710
>>                                               |           32 |
>> 1,842,392 |      0.05%
>> org.kohsuke.stapler.WebApp @ 0x6f9c0ff10
>>                                                |           64 |
>> 1,127,480 |      0.03%
>> java.lang.Thread @ 0x7d5c2d138  Handling GET
>> /view/Alle/job/common-translation-main/ : RequestHandlerThread[#105]
>> Thread|          112 |       971,336 |      0.03%
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> What is hudson.diagnosis.OldDataMonitor?  Could the disk-usage plugin
>> data be the cause of all my recent OOM errors?  If so, how do I get rid of
>> it?
>>
>> -tim
>>
>>
>> On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote:
>>>
>>> I intended to install 1.532 on Friday, but mistakenly installed 1.539.
>>>  It gave us the same OOM exceptions.  I'm installing 1.532 now and will -
>>> hopefully - know tomorrow whether it's stable or not.  I'm not exactly sure
>>> what's going to happen with our plugins though.  Hopefully Jenkins will
>>> tell me if they must be downgraded too.
>>>
>>> -tim
>>>
>>> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote:
>>>>
>>>> How does the current LTS (1.532.1) hold up?
>>>>
>>>>
>>>> On 6 December 2013 13:33, Tim Drury <[email protected]> wrote:
>>>>
>>>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're
>>>>> getting a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM is
>>>>> jdk-x64-1.6.0_26)
>>>>>
>>>>> At first I did the simplest thing and increased the heap from 3G to
>>>>> 4.2G (and bumped up permgen).  This didn't help so I started looking at
>>>>> threads via the Jenkins monitoring tool.  It indicated the disk-usage
>>>>> plugin was hung.  When you tried to view a page for a particularly large
>>>>> job, the page would "hang" and the stack trace showed the disk-usage 
>>>>> plugin
>>>>> was to blame (or so I thought).  Jira report with thread dump here:
>>>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg>
>>>>>
>>>>> We disabled the disk-usage plugin and restarted and now we can visit
>>>>> that job page.  However, we still get OOM and lots of GCs in the logs at
>>>>> least once a day.  The stack trace looks frighteningly similar to that 
>>>>> from
>>>>> the disk-usage plugin.  Here is an edited stack trace showing the methods
>>>>> common between the two OOM incidents: one during the disk-usage plugin and
>>>>> one after it was disabled:
>>>>>
>>>>> [lots of xstream methods snipped]
>>>>> hudson.XmlFile.unmarshal(XmlFile.java:165)
>>>>> hudson.model.Run.reload(Run.java:323)
>>>>> hudson.model.Run.<init>(Run.java:312)
>>>>>  hudson.model.AbstractBuild.<init>(AbstractBuild.java:185)
>>>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54)
>>>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146)
>>>>> ... [JVM methods snipped]
>>>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155)
>>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342)
>>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340)
>>>>> hudson.model.RunMap.retrieve(RunMap.java:225)
>>>>> hudson.model.RunMap.retrieve(RunMap.java:59)
>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>>>> AbstractLazyLoadRunMap.java:677)
>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>>>> AbstractLazyLoadRunMap.java:660)
>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(
>>>>> AbstractLazyLoadRunMap.java:502)
>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(
>>>>> AbstractLazyLoadRunMap.java:536)
>>>>> hudson.model.AbstractProject.getBuildByNumber(
>>>>> AbstractProject.java:1077)
>>>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165)
>>>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273)
>>>>> hudson.model.Run.isKeepLog(Run.java:572)
>>>>> ...
>>>>>
>>>>> It seems something in "core" Jenkins has changed and not for the
>>>>> better.  Anyone seeing these issues?
>>>>>
>>>>> -tim
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Jenkins Users" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Jenkins Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Getting OutOfMemoryError (OOM) constantly in latest update to 1.542.

Reply via email to