Re: Lucene Indexing out of memory

ajay_gupta Wed, 03 Mar 2010 04:12:38 -0800

Ian,
OOM exception point varies not fixed. It could come anywhere once memory
exceeds a certain point. 
I have allocated 1 GB memory for JVM. I haven't used profiler. 
When I said after 70 K docs it fails i meant approx 70k documents but if I
reduce memory then it will OOM before 70K so its not specific to any
particular document. 
To add each document first I search and then do update so I am not sure
whether lucene loads all the indices for search and thats why its going OOM
? I am not sure how search operation works in Lucene.



Thanks 
Ajay


Ian Lea wrote:
> 
> Where exactly are you hitting the OOM exception?  Have you got a stack
> trace?  How much memory are you allocating to the JVM? Have you run a
> profiler to find out what is using the memory?
> 
> If it runs OK for 70K docs then fails, 2 possibilities come to mind:
> either the 70K + 1 doc is particularly large, or you or lucene
> (unlikely) are holding on to something that you shouldn't be.
> 
> 
> --
> Ian.
> 
> 
> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay...@gmail.com> wrote:
>>
>> Hi Erick,
>> I tried setting setRAMBufferSizeMB  as 200-500MB as well but still it
>> goes
>> OOM error.
>> I thought its filebased indexing so memory shouldn't be an issue but you
>> might be right that when searching it might be using lot of memory ? Is
>> there way to load documents in chunks or someothere way to make it
>> scalable
>> ?
>>
>> Thanks in advance
>> Ajay
>>
>>
>> Erick Erickson wrote:
>>>
>>> I'm not following this entirely, but these docs may be huge by the
>>> time you add context for every word in them. You say that you
>>> "search the existing indices then I get the content and append....".
>>> So is it possible that after 70K documents your additions become
>>> so huge that you're blowing up? Have you taken any measurements
>>> to determine how big the docs get as you index more and more
>>> of them?
>>>
>>> If the above is off base, have you tried setting
>>> IndexWriter.setRAMBufferSizeMB?
>>>
>>> HTH
>>> Erick
>>>
>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay...@gmail.com> wrote:
>>>
>>>>
>>>> Hi,
>>>> It might be general question though but I couldn't find the answer yet.
>>>> I
>>>> have around 90k documents sizing around 350 MB. Each document contains
>>>> a
>>>> record which has some text content. For each word in this text I want
>>>> to
>>>> store context for that word and index it so I am reading each document
>>>> and
>>>> for each word in that document I am appending fixed number of
>>>> surrounding
>>>> words. To do that first I search in existing indices if this word
>>>> already
>>>> exist and if it is then I get the content and append the new context
>>>> and
>>>> update the document. In case no context exist I create a document with
>>>> fields "word" and "context" and add these two fields with values as
>>>> word
>>>> value and context value.
>>>>
>>>> I tried this in RAM but after certain no of docs it gave out of memory
>>>> error
>>>> so I thought to use FSDirectory method but surprisingly after 70k
>>>> documents
>>>> it also gave OOM error. I have enough disk space but still I am getting
>>>> this
>>>> error.I am not sure even for disk based indexing why its giving this
>>>> error.
>>>> I thought disk based indexing will be slow but atleast it will be
>>>> scalable.
>>>> Could someone suggest what could be the issue ?
>>>>
>>>> Thanks
>>>> Ajay
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene Indexing out of memory

Reply via email to