OK I tracked this down... indeed we have a bug in trunk (and not 3.x) whereby buffered deletes are never flushed by RAM or count.
I have unit tests that show the problem... I'll open an issue. Mike McCandless http://blog.mikemccandless.com On Sun, Jul 24, 2011 at 6:36 AM, Mike McCandless <[email protected]> wrote: > Not good! I'll dig once I'm back from vacation... sounds like something is > up. > > Mike > > Sent from my iPad > > On Jul 23, 2011, at 4:24 PM, Mark Miller <[email protected]> wrote: > >> So eventually of course, after spending a few years in GC hell, you do hit >> the OOM. >> >> On Jul 23, 2011, at 10:33 AM, Mark Miller wrote: >> >>> Alexey Serba pointed out an issue he was seeing to me last night. He said >>> that when he used an older version of Solr to index millions of docs, the >>> memory usage stayed quite low - but with a recent trunk version, the memory >>> usage sky rocketed. No OOM that I have heard of or seen yet, but rather >>> than cycling between 50 and a couple hundred megabytes of RAM, the usage >>> jumps up to what is available. It doesn't drop back down until you do a >>> commit. >>> >>> Interested, I started indexing millions of docs with my benchmark work. And >>> I didn't see the problem. Based on some early profiling by Alexey, it >>> looked like buffered deletes where involved (by default, Solr always uses >>> update to maintain unique ids). I indexed about 13 million docs, and RAM >>> usage looked nice. After a bit of digging though, I saw that the doc maker >>> was not assigning id's sequentially for some reason - it was assigning the >>> same id a bunch of times in a row before incrementing it. Odd - so I fixed >>> this to increment on every document. And now I see the problem right away >>> easily. Memory consumption just goes up, up, up and tops out near the max >>> available. >>> >>> Still investigating. I have not tried with pure Lucene yet, but it looks >>> like a pure Lucene issue to me so far. I see that in late June Mike fixed >>> something related to buffered deletes - perhaps there is still something >>> off in how ram usage is tracked for deletes? >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
