[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

Mark Miller (JIRA) Thu, 25 Oct 2007 11:27:15 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537688
 ]


Mark Miller commented on LUCENE-994:
------------------------------------

Sorry Yonik...I was not being explicit enough -- I *am* closing the Writer 
before opening the Reader. Which is why I assumed I could count on this 
behavior. Semi randomly it is failing in my app now though. I am not positive 
it is due to Lucene, I just thought that maybe the concurrent merge was somehow 
not adding the document before triggering the merge in a background thread? 
Perhaps you dont see the doc till the background threads are done merging? Just 
looking for someone to tell me, no, even with concurrent merge, as long as you 
close the writer and then open a new reader, you are guaranteed to find the doc 
just added (if all from the same thread). I really do assume this is the case, 
I just have not changed anything else other than updating Lucene, so I am 
grasping at some straws...

> Change defaults in IndexWriter to maximize "out of the box" performance
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-994
>                 URL: https://issues.apache.org/jira/browse/LUCENE-994
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-994.patch, writerinfo.zip
>
>
> This is follow-through from LUCENE-845, LUCENE-847 and LUCENE-870;
> I'll commit this once those three are committed.
> Out of the box performance of IndexWriter is maximized when flushing
> by RAM instead of a fixed document count (the default today) because
> documents can vary greatly in size.
> Likewise, merging performance should be faster when merging by net
> segment size since, to minimize the net IO cost of merging segments
> over time, you want to merge segments of equal byte size.
> Finally, ConcurrentMergeScheduler improves indexing speed
> substantially (25% in a simple initial test in LUCENE-870) because it
> runs the merges in the backround and doesn't block
> add/update/deleteDocument calls.  Most machines have concurrency
> between CPU and IO and so it makes sense to default to this
> MergeScheduler.
> Note that these changes will break users of ParallelReader because the
> parallel indices will no longer have matching docIDs.  Such users need
> to switch IndexWriter back to flushing by doc count, and switch the
> MergePolicy back to LogDocMergePolicy.  It's likely also necessary to
> switch the MergeScheduler back to SerialMergeScheduler to ensure
> deterministic docID assignment.
> I think the combination of these three default changes, plus other
> performance improvements for indexing (LUCENE-966, LUCENE-843,
> LUCENE-963, LUCENE-969, LUCENE-871, etc.) should make for some sizable
> performance gains Lucene 2.3!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-994) Change defaults in IndexWriter to maximize "out of the box" performance

Reply via email to