Where exactly are you hitting the OOM exception? Have you got a stack trace? How much memory are you allocating to the JVM? Have you run a profiler to find out what is using the memory?
If it runs OK for 70K docs then fails, 2 possibilities come to mind: either the 70K + 1 doc is particularly large, or you or lucene (unlikely) are holding on to something that you shouldn't be. -- Ian. On Tue, Mar 2, 2010 at 1:48 PM, ajay_gupta <[email protected]> wrote: > > Hi Erick, > I tried setting setRAMBufferSizeMB as 200-500MB as well but still it goes > OOM error. > I thought its filebased indexing so memory shouldn't be an issue but you > might be right that when searching it might be using lot of memory ? Is > there way to load documents in chunks or someothere way to make it scalable > ? > > Thanks in advance > Ajay > > > Erick Erickson wrote: >> >> I'm not following this entirely, but these docs may be huge by the >> time you add context for every word in them. You say that you >> "search the existing indices then I get the content and append....". >> So is it possible that after 70K documents your additions become >> so huge that you're blowing up? Have you taken any measurements >> to determine how big the docs get as you index more and more >> of them? >> >> If the above is off base, have you tried setting >> IndexWriter.setRAMBufferSizeMB? >> >> HTH >> Erick >> >> On Tue, Mar 2, 2010 at 8:27 AM, ajay_gupta <[email protected]> wrote: >> >>> >>> Hi, >>> It might be general question though but I couldn't find the answer yet. I >>> have around 90k documents sizing around 350 MB. Each document contains a >>> record which has some text content. For each word in this text I want to >>> store context for that word and index it so I am reading each document >>> and >>> for each word in that document I am appending fixed number of surrounding >>> words. To do that first I search in existing indices if this word already >>> exist and if it is then I get the content and append the new context and >>> update the document. In case no context exist I create a document with >>> fields "word" and "context" and add these two fields with values as word >>> value and context value. >>> >>> I tried this in RAM but after certain no of docs it gave out of memory >>> error >>> so I thought to use FSDirectory method but surprisingly after 70k >>> documents >>> it also gave OOM error. I have enough disk space but still I am getting >>> this >>> error.I am not sure even for disk based indexing why its giving this >>> error. >>> I thought disk based indexing will be slow but atleast it will be >>> scalable. >>> Could someone suggest what could be the issue ? >>> >>> Thanks >>> Ajay >>> -- >>> View this message in context: >>> http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27755872.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> > > -- > View this message in context: > http://old.nabble.com/Lucene-Indexing-out-of-memory-tp27755872p27756082.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
