+1

I'll open an issue.

Mike

On Fri, Nov 20, 2009 at 8:11 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> Thanks Bogdan, I've been meaning to bring this up.
> Solr used a TreeMap in the past (when it handled it's own deletes) for
> the same exact reason.  In my profiling, I've also seen applyDeletes()
> taking the bulk of the time with small/simple document indexing.
>
> So we should definitely go in sorted order (either via TreeMap or sort
> the HashMap).
>
> -Yonik
> http://www.lucidimagination.com
>
> On Fri, Nov 20, 2009 at 7:21 AM, Bogdan Ghidireac <bog...@ecstend.com> wrote:
>> Hi,
>>
>> One of the use case of my application involves updating the index with
>> 10 to 10k docs every few minutes. Because we maintain a PK for each
>> doc we have to use IndexWriter.updateDocument to be consistent.
>>
>> The average time for an update when we commit every 10k docs is around
>> 17ms (the IndexWriter buffer is 100MB). I profiled the application for
>> several hours and I noticed that most of the time is spent in
>> IndexWriter.applyDeletes()->TermDocs.seek(). I changed the
>> BufferedDeletes.terms from HashMap to TreeMap to have the terms
>> ordered and to reduce the number of random seeks on the disk.
>>
>> I run my tests again with the patched Lucene 2.9.1 and the time has
>> dropped from 17ms to 2ms. The index has 18GB and 70 million docs.
>>
>> I cannot send a patch because my company has some strict and time
>> consuming policies about open source but the change is small and can
>> be applied easily.
>>
>> Regards,
>> Bogdan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to