I had some success getting the Manage Old Data screen to work. Most of the time it will throw a ConcurrentModificationException, but occasionally it will list a few hundred records with the "Discard Old Data" button. I press the button and - again - sometimes it will "work" and sometimes throw a CME, but in either case it does seem to delete some of the old data. I repeated this process about every hour a couple days ago and managed to delete enough old data that Jenkins continued to run for more than a day. The best chance of this working is when no build jobs are running.
The alternative is to manually delete the disk-usage XML elements from the build.xml files in each job's build directories. I did this for about 200 files before I got tired of doing it. A groovy script could probably be written to do this. -tim On Thursday, December 12, 2013 5:26:42 AM UTC-5, nigelm wrote: > > So this is what is happening for us : > > - The build-usage plugin was displaying the problems at the beginning of > the thread, so we disabled it. > - Now, every build that we do, and every sub-project fills up the 'Old > data' log, with hundreds of > CannotResolveClassException: hudson.plugins.disk_usage.BuildDiskUsageAction > > even though that plugin is not used in that build, and does not exist any > more. > > After a modest number of builds (say, 1/2 a day or so), Jenkins bombs with > OOM as this log is filled with *millions* of entries, and it's game over. > > Is there a way to disable this functionality? I can't see the utility of > it, and it's making the system totally unusable. > > > > On Wed, Dec 11, 2013 at 5:55 PM, Nigel Magnay > <[email protected]<javascript:> > > wrote: > >> I've just cracked out MAT on a oom dump from our machine, and I can >> confirm that it looks like OldDataMonitor is the culprit here, too (750Mb >> of retained heap). >> >> There's over a million entries in the hashmap... >> >> >> >> >> On Mon, Dec 9, 2013 at 4:32 PM, Tim Drury <[email protected] <javascript:> >> > wrote: >> >>> I'm doing a heap-dump analysis now and I think I might know what the >>> issue was. The start of this whole problem was the disk-usage plugin >>> hanging our attempts to view a job in Jenkins (see >>> https://issues.jenkins-ci.org/browse/JENKINS-20876) so we disabled that >>> plugin. After disabling, Jenkins complained about data in an >>> older/unreadable format: >>> >>> You have data stored in an older format and/or unreadable data. >>> >>> If I click the "Manage" button to delete it, it takes a _long_ time for >>> it to display all the disk-usage plugin data - there must be thousands of >>> rows, but it does display it all eventually. The error shown in each row >>> is: >>> >>> CannotResolveClassException: >>> hudson.plugins.disk_usage.BuildDiskUsageAction >>> >>> If I click "Discard Unreadable Data" at the bottom of the page, I >>> quickly get a stack trace: >>> >>> javax.servlet.ServletException: java.util.ConcurrentModificationException >>> at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:735) >>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799) >>> at org.kohsuke.stapler.MetaClass$6.doDispatch(MetaClass.java:239) >>> at >>> org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:53) >>> at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:685) >>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:799) >>> at org.kohsuke.stapler.Stapler.invoke(Stapler.java:587) >>> at org.kohsuke.stapler.Stapler.service(Stapler.java:218) >>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:45) >>> at winstone.ServletConfiguration.execute(ServletConfiguration.java:248) >>> at winstone.RequestDispatcher.forward(RequestDispatcher.java:333) >>> at winstone.RequestDispatcher.doFilter(RequestDispatcher.java:376) >>> at >>> hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:96) >>> at >>> net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:203) >>> at >>> net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:181) >>> at >>> net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:86) >>> >>> and it fails to discard the data. Older data isn't usually a problem so >>> I brushed off this error. However, here is dominator_tree of the heap dump: >>> >>> Class Name >>> | Shallow Heap | Retained >>> Heap | Percentage >>> >>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> hudson.diagnosis.OldDataMonitor @ 0x6f9f2c4a0 >>> | 24 | >>> 3,278,466,984 | 88.69% >>> com.thoughtworks.xstream.converters.SingleValueConverterWrapper @ >>> 0x6f9da8780 | 16 | >>> 13,825,616 | 0.37% >>> hudson.model.Hudson @ 0x6f9b8b8e8 >>> | 272 | >>> 3,572,400 | 0.10% >>> org.eclipse.jetty.webapp.WebAppClassLoader @ 0x6f9a73598 >>> | 88 | >>> 2,308,760 | 0.06% >>> org.apache.commons.jexl.util.introspection.Introspector @ 0x6fbb74710 >>> | 32 | >>> 1,842,392 | 0.05% >>> org.kohsuke.stapler.WebApp @ 0x6f9c0ff10 >>> | 64 | >>> 1,127,480 | 0.03% >>> java.lang.Thread @ 0x7d5c2d138 Handling GET >>> /view/Alle/job/common-translation-main/ : RequestHandlerThread[#105] >>> Thread| 112 | 971,336 | 0.03% >>> >>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>> >>> What is hudson.diagnosis.OldDataMonitor? Could the disk-usage plugin >>> data be the cause of all my recent OOM errors? If so, how do I get rid of >>> it? >>> >>> -tim >>> >>> >>> On Monday, December 9, 2013 9:41:25 AM UTC-5, Tim Drury wrote: >>>> >>>> I intended to install 1.532 on Friday, but mistakenly installed 1.539. >>>> It gave us the same OOM exceptions. I'm installing 1.532 now and will - >>>> hopefully - know tomorrow whether it's stable or not. I'm not exactly >>>> sure >>>> what's going to happen with our plugins though. Hopefully Jenkins will >>>> tell me if they must be downgraded too. >>>> >>>> -tim >>>> >>>> On Monday, December 9, 2013 7:45:28 AM UTC-5, Stephen Connolly wrote: >>>>> >>>>> How does the current LTS (1.532.1) hold up? >>>>> >>>>> >>>>> On 6 December 2013 13:33, Tim Drury <[email protected]> wrote: >>>>> >>>>>> We updated Jenkins to 1.542 two days ago (from 1.514) and we're >>>>>> getting a lot of OOM errors. (info: Windows server 2008 R2, Jenkins JVM >>>>>> is >>>>>> jdk-x64-1.6.0_26) >>>>>> >>>>>> At first I did the simplest thing and increased the heap from 3G to >>>>>> 4.2G (and bumped up permgen). This didn't help so I started looking at >>>>>> threads via the Jenkins monitoring tool. It indicated the disk-usage >>>>>> plugin was hung. When you tried to view a page for a particularly large >>>>>> job, the page would "hang" and the stack trace showed the disk-usage >>>>>> plugin >>>>>> was to blame (or so I thought). Jira report with thread dump here: >>>>>> https://issues.jenkins-ci.org/browse/JENKINS-20876<https://www.google.com/url?q=https%3A%2F%2Fissues.jenkins-ci.org%2Fbrowse%2FJENKINS-20876&sa=D&sntz=1&usg=AFQjCNFcjP8y2rafiviVJB5cLwC_Tn7MPg> >>>>>> >>>>>> We disabled the disk-usage plugin and restarted and now we can visit >>>>>> that job page. However, we still get OOM and lots of GCs in the logs at >>>>>> least once a day. The stack trace looks frighteningly similar to that >>>>>> from >>>>>> the disk-usage plugin. Here is an edited stack trace showing the >>>>>> methods >>>>>> common between the two OOM incidents: one during the disk-usage plugin >>>>>> and >>>>>> one after it was disabled: >>>>>> >>>>>> [lots of xstream methods snipped] >>>>>> hudson.XmlFile.unmarshal(XmlFile.java:165) >>>>>> hudson.model.Run.reload(Run.java:323) >>>>>> hudson.model.Run.<init>(Run.java:312) >>>>>> hudson.model.AbstractBuild.<init>(AbstractBuild.java:185) >>>>>> hudson.maven.AbstractMavenBuild.<init>(AbstractMavenBuild.java:54) >>>>>> hudson.maven.MavenModuleSetBuild.<init>(MavenModuleSetBuild.java:146) >>>>>> ... [JVM methods snipped] >>>>>> hudson.model.AbstractProject.loadBuild(AbstractProject.java:1155) >>>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:342) >>>>>> hudson.model.AbstractProject$1.create(AbstractProject.java:340) >>>>>> hudson.model.RunMap.retrieve(RunMap.java:225) >>>>>> hudson.model.RunMap.retrieve(RunMap.java:59) >>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load( >>>>>> AbstractLazyLoadRunMap.java:677) >>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.load( >>>>>> AbstractLazyLoadRunMap.java:660) >>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.search( >>>>>> AbstractLazyLoadRunMap.java:502) >>>>>> jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber( >>>>>> AbstractLazyLoadRunMap.java:536) >>>>>> hudson.model.AbstractProject.getBuildByNumber( >>>>>> AbstractProject.java:1077) >>>>>> hudson.maven.MavenBuild.getParentBuild(MavenBuild.java:165) >>>>>> hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:273) >>>>>> hudson.model.Run.isKeepLog(Run.java:572) >>>>>> ... >>>>>> >>>>>> It seems something in "core" Jenkins has changed and not for the >>>>>> better. Anyone seeing these issues? >>>>>> >>>>>> -tim >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Jenkins Users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Jenkins Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
