Re: Lucene Indexing out of memory

Ian Lea Wed, 03 Mar 2010 04:21:29 -0800

Lucene doesn't load everything into memory and can carry on running
consecutive searches or loading documents for ever without hitting OOM
exceptions.  So if it isn't failing on a specific document the most
likely cause is that your program is hanging on to something it
shouldn't. Previous docs? File handles?  Lucene readers/searchers?



--
Ian.


On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta <[email protected]> wrote:
>
> Ian,
> OOM exception point varies not fixed. It could come anywhere once memory
> exceeds a certain point.
> I have allocated 1 GB memory for JVM. I haven't used profiler.
> When I said after 70 K docs it fails i meant approx 70k documents but if I
> reduce memory then it will OOM before 70K so its not specific to any
> particular document.
> To add each document first I search and then do update so I am not sure
> whether lucene loads all the indices for search and thats why its going OOM
> ? I am not sure how search operation works in Lucene.
>
>
> Thanks
> Ajay
>
>
> Ian Lea wrote:
>>
>> Where exactly are you hitting the OOM exception?  Have you got a stack
>> trace?  How much memory are you allocating to the JVM? Have you run a
>> profiler to find out what is using the memory?
>>
>> If it runs OK for 70K docs then fails, 2 possibilities come to mind:
>> either the 70K + 1 doc is particularly large, or you or lucene
>> (unlikely) are holding on to something that you shouldn't be.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <[email protected]> wrote:
>>>
>>> Hi Erick,
>>> I tried setting setRAMBufferSizeMB  as 200-500MB as well but still it
>>> goes
>>> OOM error.
>>> I thought its filebased indexing so memory shouldn't be an issue but you
>>> might be right that when searching it might be using lot of memory ? Is
>>> there way to load documents in chunks or someothere way to make it
>>> scalable
>>> ?
>>>
>>> Thanks in advance
>>> Ajay
>>>
>>>
>>> Erick Erickson wrote:
>>>>
>>>> I'm not following this entirely, but these docs may be huge by the
>>>> time you add context for every word in them. You say that you
>>>> "search the existing indices then I get the content and append....".
>>>> So is it possible that after 70K documents your additions become
>>>> so huge that you're blowing up? Have you taken any measurements
>>>> to determine how big the docs get as you index more and more
>>>> of them?
>>>>
>>>> If the above is off base, have you tried setting
>>>> IndexWriter.setRAMBufferSizeMB?
>>>>
>>>> HTH
>>>> Erick
>>>>
>>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <[email protected]> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>> It might be general question though but I couldn't find the answer yet.
>>>>> I
>>>>> have around 90k documents sizing around 350 MB. Each document contains
>>>>> a
>>>>> record which has some text content. For each word in this text I want
>>>>> to
>>>>> store context for that word and index it so I am reading each document
>>>>> and
>>>>> for each word in that document I am appending fixed number of
>>>>> surrounding
>>>>> words. To do that first I search in existing indices if this word
>>>>> already
>>>>> exist and if it is then I get the content and append the new context
>>>>> and
>>>>> update the document. In case no context exist I create a document with
>>>>> fields "word" and "context" and add these two fields with values as
>>>>> word
>>>>> value and context value.
>>>>>
>>>>> I tried this in RAM but after certain no of docs it gave out of memory
>>>>> error
>>>>> so I thought to use FSDirectory method but surprisingly after 70k
>>>>> documents
>>>>> it also gave OOM error. I have enough disk space but still I am getting
>>>>> this
>>>>> error.I am not sure even for disk based indexing why its giving this
>>>>> error.
>>>>> I thought disk based indexing will be slow but atleast it will be
>>>>> scalable.
>>>>> Could someone suggest what could be the issue ?
>>>>>
>>>>> Thanks
>>>>> Ajay
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Lucene Indexing out of memory

Reply via email to