+1 I'll open an issue.
Mike On Fri, Nov 20, 2009 at 8:11 AM, Yonik Seeley <yo...@lucidimagination.com> wrote: > Thanks Bogdan, I've been meaning to bring this up. > Solr used a TreeMap in the past (when it handled it's own deletes) for > the same exact reason. In my profiling, I've also seen applyDeletes() > taking the bulk of the time with small/simple document indexing. > > So we should definitely go in sorted order (either via TreeMap or sort > the HashMap). > > -Yonik > http://www.lucidimagination.com > > On Fri, Nov 20, 2009 at 7:21 AM, Bogdan Ghidireac <bog...@ecstend.com> wrote: >> Hi, >> >> One of the use case of my application involves updating the index with >> 10 to 10k docs every few minutes. Because we maintain a PK for each >> doc we have to use IndexWriter.updateDocument to be consistent. >> >> The average time for an update when we commit every 10k docs is around >> 17ms (the IndexWriter buffer is 100MB). I profiled the application for >> several hours and I noticed that most of the time is spent in >> IndexWriter.applyDeletes()->TermDocs.seek(). I changed the >> BufferedDeletes.terms from HashMap to TreeMap to have the terms >> ordered and to reduce the number of random seeks on the disk. >> >> I run my tests again with the patched Lucene 2.9.1 and the time has >> dropped from 17ms to 2ms. The index has 18GB and 70 million docs. >> >> I cannot send a patch because my company has some strict and time >> consuming policies about open source but the change is small and can >> be applied easily. >> >> Regards, >> Bogdan > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org