Hi, I did further investigation (with jvisualvm - you can use any version, also the newest one with other bitness, it can always read the heap dump - I recommend the Java 7 64bit one, its most fancy and does not itself OOM):
> When looking at the MBean mess, it looks like: > The whole VM is filled with MBean statistics (20% of the total heap!!!), just > for statistics. It looks like the MBean server is not shut down correctly when > the Solr instance shuts down, so it sums up while running tests, every new > Solr instance adds new statistics to the huge MBean maps eating all the heap > (and possibly permgen, because most strings may be interned)! This is a > huge leak, we should fix this (or disable the whole useless MBean shit > completely, at least for tests). Was this strange, never-seen package > com.yammer.metrics introduced recently related to mbeans - or is > zookeeper the bad guy? It's much worse: the String instances are only 20% of heap, but 26% are used for the ConcurrentHashMap.Entry classes holding those references and tons of ConcurrentHashMaps and com.yammer.metrics.core instances, eating up 60% of the total heap space (only reachable object, not those to be GCed). The big question: Do we need com.yammer.metrics.core (it is metrics-core-2.1.2.jar in solr/core/lib) at all? When was it introduced? Lucene 3.6 does not have it, neither Solr 4.0. It must be introduced recently - and eats up all memory. Uwe > > -----Original Message----- > > From: Mark Miller [mailto:[email protected]] > > Sent: Wednesday, December 26, 2012 3:22 AM > > To: [email protected] > > Subject: Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_37) - > > Build # > > 3421 - Failure! > > > > Is this one a nightly build? > > > > I can run it and look at it closely tomorrow. > > > > - Mark > > > > On Dec 25, 2012, at 6:04 PM, Uwe Schindler <[email protected]> wrote: > > > > > Can we add a finally/try block that catches permgen errors and calls > > System.halt (not exit)? I could add another extra allowance to the > > security manager, disallowing exits. > > > > > > But we should try to find the issue in the tests, maybe Mark has an idea. > > We have the heap dump readily available, but I don't have the tools to > > inspect it. > > > > > > Uwe > > > > > > > > > > > > Dawid Weiss <[email protected]> schrieb: > > > > the test framework crashes somehow and does not respond anymore. > > > > > > I think I know exactly how it crashes -- there's not much mystery > > > about this: once the permgen is exhausted OOM errors are thrown from > > > tests; what happens then is these errors are caught and an attempt > > > is made to serialize these errors to the master node. Unfortunately > > > this process involves loading some classes that are not yet loaded > > > and, since the permgen is already exhausted, everything goes insane > > > (the thread apparently just silently quits; there are finally blocks > > > that are never reached). > > > > > > Like I said -- I'll see what I can do about it but I don't have any > > > optimistic feelings. This is really riding a critical edge and short > > > of preallocating static data structures I don't see any way of > > > implementing a clean solution for the problem. > > > > > > Dawid > > > > > > > > > To unsubscribe, e-mail: [email protected] For > > > additional commands, e-mail: [email protected] > > > > > > > > > -- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, 28213 Bremen > > > http://www.thetaphi.de > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] For > > additional commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
