OK I tracked this down... indeed we have a bug in trunk (and not 3.x)
whereby buffered deletes are never flushed by RAM or count.

I have unit tests that show the problem... I'll open an issue.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Jul 24, 2011 at 6:36 AM, Mike McCandless
<[email protected]> wrote:
> Not good!  I'll dig once I'm back from vacation... sounds like something is 
> up.
>
> Mike
>
> Sent from my iPad
>
> On Jul 23, 2011, at 4:24 PM, Mark Miller <[email protected]> wrote:
>
>> So eventually of course, after spending a few years in GC hell, you do hit 
>> the OOM.
>>
>> On Jul 23, 2011, at 10:33 AM, Mark Miller wrote:
>>
>>> Alexey Serba pointed out an issue he was seeing to me last night. He said 
>>> that when he used an older version of Solr to index millions of docs, the 
>>> memory usage stayed quite low - but with a recent trunk version, the memory 
>>> usage sky rocketed. No OOM that I have heard of or seen yet, but rather 
>>> than cycling between 50 and a couple hundred megabytes of RAM, the usage 
>>> jumps up to what is available. It doesn't drop back down until you do a 
>>> commit.
>>>
>>> Interested, I started indexing millions of docs with my benchmark work. And 
>>> I didn't see the problem. Based on some early profiling by Alexey, it 
>>> looked like buffered deletes where involved (by default, Solr always uses 
>>> update to maintain unique ids). I indexed about 13 million docs, and RAM 
>>> usage looked nice. After a bit of digging though, I saw that the doc maker 
>>> was not assigning id's sequentially for some reason - it was assigning the 
>>> same id a bunch of times in a row before incrementing it. Odd - so I fixed 
>>> this to increment on every document. And now I see the problem right away 
>>> easily. Memory consumption just goes up, up, up and tops out near the max 
>>> available.
>>>
>>> Still investigating. I have not tried with pure Lucene yet, but it looks 
>>> like a pure Lucene issue to me so far. I see that in late June Mike fixed 
>>> something related to buffered deletes - perhaps there is still something 
>>> off in how ram usage is tracked for deletes?
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to