Lucene doesn't load everything into memory and can carry on running consecutive searches or loading documents for ever without hitting OOM exceptions. So if it isn't failing on a specific document the most likely cause is that your program is hanging on to something it shouldn't. Previous docs? File handles? Lucene readers/searchers?
-- Ian. On Wed, Mar 3, 2010 at 12:12 PM, ajay_gupta <ajay...@gmail.com> wrote: > > Ian, > OOM exception point varies not fixed. It could come anywhere once memory > exceeds a certain point. > I have allocated 1 GB memory for JVM. I haven't used profiler. > When I said after 70 K docs it fails i meant approx 70k documents but if I > reduce memory then it will OOM before 70K so its not specific to any > particular document. > To add each document first I search and then do update so I am not sure > whether lucene loads all the indices for search and thats why its going OOM > ? I am not sure how search operation works in Lucene. > > > Thanks > Ajay > > > Ian Lea wrote: >> >> Where exactly are you hitting the OOM exception? Have you got a stack >> trace? How much memory are you allocating to the JVM? Have you run a >> profiler to find out what is using the memory? >> >> If it runs OK for 70K docs then fails, 2 possibilities come to mind: >> either the 70K + 1 doc is particularly large, or you or lucene >> (unlikely) are holding on to something that you shouldn't be. >> >> >> -- >> Ian. >> >> >> On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <ajay...@gmail.com> wrote: >>> >>> Hi Erick, >>> I tried setting setRAMBufferSizeMB as 200-500MB as well but still it >>> goes >>> OOM error. >>> I thought its filebased indexing so memory shouldn't be an issue but you >>> might be right that when searching it might be using lot of memory ? Is >>> there way to load documents in chunks or someothere way to make it >>> scalable >>> ? >>> >>> Thanks in advance >>> Ajay >>> >>> >>> Erick Erickson wrote: >>>> >>>> I'm not following this entirely, but these docs may be huge by the >>>> time you add context for every word in them. You say that you >>>> "search the existing indices then I get the content and append....". >>>> So is it possible that after 70K documents your additions become >>>> so huge that you're blowing up? Have you taken any measurements >>>> to determine how big the docs get as you index more and more >>>> of them? >>>> >>>> If the above is off base, have you tried setting >>>> IndexWriter.setRAMBufferSizeMB? >>>> >>>> HTH >>>> Erick >>>> >>>> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <ajay...@gmail.com> wrote: >>>> >>>>> >>>>> Hi, >>>>> It might be general question though but I couldn't find the answer yet. >>>>> I >>>>> have around 90k documents sizing around 350 MB. Each document contains >>>>> a >>>>> record which has some text content. For each word in this text I want >>>>> to >>>>> store context for that word and index it so I am reading each document >>>>> and >>>>> for each word in that document I am appending fixed number of >>>>> surrounding >>>>> words. To do that first I search in existing indices if this word >>>>> already >>>>> exist and if it is then I get the content and append the new context >>>>> and >>>>> update the document. In case no context exist I create a document with >>>>> fields "word" and "context" and add these two fields with values as >>>>> word >>>>> value and context value. >>>>> >>>>> I tried this in RAM but after certain no of docs it gave out of memory >>>>> error >>>>> so I thought to use FSDirectory method but surprisingly after 70k >>>>> documents >>>>> it also gave OOM error. I have enough disk space but still I am getting >>>>> this >>>>> error.I am not sure even for disk based indexing why its giving this >>>>> error. >>>>> I thought disk based indexing will be slow but atleast it will be >>>>> scalable. >>>>> Could someone suggest what could be the issue ? >>>>> >>>>> Thanks >>>>> Ajay >>>>> -- >>>>> View this message in context: >>>>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > -- > View this message in context: > http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27767405.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org