I had some success getting the Manage Old Data screen to work.  Most of the 
time it will throw a ConcurrentModificationException, but occasionally it 
will list a few hundred records with the "Discard Old Data" button.  I 
press the button and - again - sometimes it will "work" and sometimes throw 
a CME, but in either case it does seem to delete some of the old data.  I 
repeated this process about every hour a couple days ago and managed to 
delete enough old data that Jenkins continued to run for more than a day. 
 The best chance of this working is when no build jobs are running.

The alternative is to manually delete the disk-usage XML elements from the 
build.xml files in each job's build directories.  I did this for about 200 
files before I got tired of doing it.  A groovy script could probably be 
written to do this.

-tim

On Thursday, December 12, 2013 5:26:42 AM UTC-5, nigelm wrote:
>
> So this is what is happening for us :
>
> - The build-usage plugin was displaying the problems at the beginning of 
> the thread, so we disabled it.
> - Now, every build that we do, and every sub-project fills up the 'Old 
> data' log, with hundreds of 
> CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction
>
> even though that plugin is not used in that build, and does not exist any 
> more.
>
> After a modest number of builds (say, 1/2 a day or so), Jenkins bombs with 
> OOM as this log is filled with *millions* of entries, and it's game over.
>
> Is there a way to disable this functionality? I can't see the utility of 
> it, and it's making the system totally unusable.
>
>
>
> On Wed, Dec 11, 2013 at 5:55 PM, Nigel Magnay 
> <[email protected]<javascript:>
> > wrote:
>
>> I've just cracked out MAT on a oom dump from our machine, and I can 
>> confirm that it looks like OldDataMonitor is the culprit here, too (750Mb 
>> of retained heap).
>>
>> There's over a million entries in the hashmap...
>>
>>
>>  
>>
>> On Mon, Dec 9, 2013 at 4:32 PM, Tim Drury <[email protected] <javascript:>
>> > wrote:
>>
>>> I'm doing a heap-dump analysis now and I think I might know what the 
>>> issue was.  The start of this whole problem was the disk-usage plugin 
>>> hanging our attempts to view a job in Jenkins (see 
>>> https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that 
>>> plugin.  After disabling, Jenkins complained about data in an 
>>> older/unreadable format:
>>>
>>> You have data stored in an older format and/or unreadable data.
>>>
>>> If I click the "Manage" button to delete it, it takes a _long_ time for 
>>> it to display all the disk-usage plugin data - there must be thousands of 
>>> rows, but it does display it all eventually.  The error shown in each row 
>>> is:
>>>
>>> CannotResolveClassException: 
>>> hudson.plugins.disk_usage.BuildDiskUsageAction
>>>  
>>> If I click "Discard Unreadable Data" at the bottom of the page, I 
>>> quickly get a stack trace:
>>>
>>> javax.servlet.ServletException: java.util.ConcurrentModificationException
>>>  at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735)
>>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
>>>  at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239)
>>> at 
>>> org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53)
>>>  at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685)
>>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799)
>>>  at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587)
>>> at org.kohsuke.stapler.Stapler.service(Stapler.java:218)
>>>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:45)
>>> at winstone.ServletConfiguration.execute(ServletConfiguration.java:248)
>>>  at winstone.RequestDispatcher.forward(RequestDispatcher.java:333)
>>> at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376)
>>>  at 
>>> hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96)
>>> at 
>>> net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203)
>>>  at 
>>> net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181)
>>> at 
>>> net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86)
>>>
>>> and it fails to discard the data.  Older data isn't usually a problem so 
>>> I brushed off this error.  However, here is dominator_tree of the heap dump:
>>>
>>> Class Name                                                               
>>>                                                | Shallow Heap | Retained 
>>> Heap | Percentage
>>>
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0                           
>>>                                                 |           24 | 
>>> 3,278,466,984 |     88.69%
>>> com.thoughtworks.xstream.converters.SingleValueConverterWrapper @ 
>>> 0x6f9da8780                                           |           16 |   
>>>  13,825,616 |      0.37%
>>> hudson.model.Hudson @ 0x6f9b8b8e8                                       
>>>                                                 |          272 |     
>>> 3,572,400 |      0.10%
>>> org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598                 
>>>                                                |           88 |     
>>> 2,308,760 |      0.06%
>>> org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710   
>>>                                                 |           32 |     
>>> 1,842,392 |      0.05%
>>> org.kohsuke.stapler.WebApp @ 0x6f9c0ff10                                 
>>>                                                |           64 |     
>>> 1,127,480 |      0.03%
>>> java.lang.Thread @ 0x7d5c2d138  Handling GET 
>>> /view/Alle/job/common-translation-main/ : RequestHandlerThread[#105] 
>>> Thread|          112 |       971,336 |      0.03%
>>>
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> What is hudson.diagnosis.OldDataMonitor?  Could the disk-usage plugin 
>>> data be the cause of all my recent OOM errors?  If so, how do I get rid of 
>>> it?
>>>
>>> -tim
>>>
>>>
>>> On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote:
>>>>
>>>> I intended to install 1.532 on Friday, but mistakenly installed 1.539. 
>>>>  It gave us the same OOM exceptions.  I'm installing 1.532 now and will - 
>>>> hopefully - know tomorrow whether it's stable or not.  I'm not exactly 
>>>> sure 
>>>> what's going to happen with our plugins though.  Hopefully Jenkins will 
>>>> tell me if they must be downgraded too.
>>>>
>>>> -tim
>>>>
>>>> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote:
>>>>>
>>>>> How does the current LTS (1.532.1) hold up?
>>>>>
>>>>>
>>>>> On 6 December 2013 13:33, Tim Drury <[email protected]> wrote:
>>>>>
>>>>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're 
>>>>>> getting a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM 
>>>>>> is 
>>>>>> jdk-x64-1.6.0_26)
>>>>>>
>>>>>> At first I did the simplest thing and increased the heap from 3G to 
>>>>>> 4.2G (and bumped up permgen).  This didn't help so I started looking at 
>>>>>> threads via the Jenkins monitoring tool.  It indicated the disk-usage 
>>>>>> plugin was hung.  When you tried to view a page for a particularly large 
>>>>>> job, the page would "hang" and the stack trace showed the disk-usage 
>>>>>> plugin 
>>>>>> was to blame (or so I thought).  Jira report with thread dump here: 
>>>>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg>
>>>>>>
>>>>>> We disabled the disk-usage plugin and restarted and now we can visit 
>>>>>> that job page.  However, we still get OOM and lots of GCs in the logs at 
>>>>>> least once a day.  The stack trace looks frighteningly similar to that 
>>>>>> from 
>>>>>> the disk-usage plugin.  Here is an edited stack trace showing the 
>>>>>> methods 
>>>>>> common between the two OOM incidents: one during the disk-usage plugin 
>>>>>> and 
>>>>>> one after it was disabled:
>>>>>>
>>>>>> [lots of xstream methods snipped]
>>>>>> hudson.XmlFile.unmarshal(XmlFile.java:165)
>>>>>> hudson.model.Run.reload(Run.java:323)
>>>>>> hudson.model.Run.<init>(Run.java:312)
>>>>>>  hudson.model.AbstractBuild.<init>(AbstractBuild.java:185)
>>>>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54)
>>>>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146)
>>>>>> ... [JVM methods snipped]
>>>>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155)
>>>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342)
>>>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340)
>>>>>> hudson.model.RunMap.retrieve(RunMap.java:225)
>>>>>> hudson.model.RunMap.retrieve(RunMap.java:59)
>>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>>>>> AbstractLazyLoadRunMap.java:677)
>>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load(
>>>>>> AbstractLazyLoadRunMap.java:660)
>>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search(
>>>>>> AbstractLazyLoadRunMap.java:502)
>>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(
>>>>>> AbstractLazyLoadRunMap.java:536)
>>>>>> hudson.model.AbstractProject.getBuildByNumber(
>>>>>> AbstractProject.java:1077)
>>>>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165)
>>>>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273)
>>>>>> hudson.model.Run.isKeepLog(Run.java:572)
>>>>>> ...
>>>>>>
>>>>>> It seems something in "core" Jenkins has changed and not for the 
>>>>>> better.  Anyone seeing these issues?
>>>>>>
>>>>>> -tim
>>>>>>  
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Jenkins Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Jenkins Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to