If you're CPU-bound - I've had issues before with GC in long-running indexing tasks loading very large volumes (100s of millions) of docs. I was seeing lots of CPU usage tied up in GC.
I solved all these problems by firing batches of indexing activity off in seperate processes then immediately killing them after use. This is the most effective form of garbage collection you can get... I don't see any advantage (other than hotspotting) of keeping any of the resources created during an indexing batch after that batch has been committed. Why burn CPU cycles garbage collecting when you know there is nothing of value to be retained? Clearly this may not be an approach where batches of updates are committed very frequently (i.e. when measured in seconds not minutes) but for heavy ingest this approach can sit quite happily alongside a seperate long-running process used for servicing searches without loading the search process with the task of garbage collecting the indexing debris. The search process can periodically reopen IndexReader to see the new segments created in other indexing processes. Cheers Mark ----- Original Message ---- From: Erick Erickson <erickerick...@gmail.com> To: java-user@lucene.apache.org Sent: Thursday, 30 April, 2009 14:46:21 Subject: Re: Indexing becomes slow with time This is surprising behavior, which is another way of saying that, given what you've said so far, this shouldn't be happening. I'd really look at system metrics, like whether you're swapping etc. In particular you might want to try varying how big you allow your memory footprint to grow before you flush, this is in the doc Ian pointed out under * Flush by RAM usage instead of document count* There's no need to periodically optimize, just do that at the end if you must. Best Erick On Thu, Apr 30, 2009 at 6:23 AM, liat oren <oren.l...@gmail.com> wrote: > Yes, I do run optimize... > > I did start looking at these tips in the last few days, but didn't think > the > optimize makes it so slow. > > Thanks! > > 2009/4/30 Ian Lea <ian....@gmail.com> > > > Are you maybe running optimize after every n documents? There are > > lots of tips in > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed. > > > > > > -- > > Ian. > > > > > > On Thu, Apr 30, 2009 at 8:29 AM, liat oren <oren.l...@gmail.com> wrote: > > > Hi, > > > > > > I noticed that when I start to index, it indexes 7 documents a second. > > After > > > 30 minutes it goes down to 3 documents a second. > > > After two hours it becomes very slow (I stopped it when it arrived to > > 320MB > > > and did 1 document in almost a minute) > > > > > > As you can see, it happens only after 2000, 3000 documnet. > > > Should I split them into more indexes? > > > > > > > > > Thanks, > > > Liat > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org