[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Michael McCandless (JIRA) Sun, 14 Mar 2010 03:11:55 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845061#action_12845061
 ]


Michael McCandless commented on LUCENE-2312:
--------------------------------------------

bq. IW commitMerge calls docWriter's remapDeletes, a synchronized method to 
prevent concurrent updates. I'm not sure how we should efficiently block calls 
to the different DW's.

Yeah this is because when we buffer a delete Term/Query, the docID we store 
against it is absolute.  It *seems* like it could/should be relative (ie, 
within the RAM segment), then remapping wouldn't be needed when a merge 
commits.  I think?

bq. _mergeInit calls docWriter getDocStoreSegment - unsure what to change

It wouldn't anymore once we have private RAM segments: we would no longer share 
doc stores across segments, meaning merging will always merge doc stores and 
there's no need to call that method nor have all the logic in SegmentMerger to 
determine whether doc store merging is required.

This will necessarily be a perf hit when up and building a large index from 
scratch in a single IW session.  Today that index creates one large set of doc 
stores and never has to merge it while building.  This is the biggest perf 
downside to this change, I think.

But maybe the perf loss will not be so bad, because of bulk merging, in the 
case when all docs always add the same fields in the same order.  Or... if we 
could fix lucene to always bind the same field name to the same field number 
(LUCENE-1737) then we'd always bulk-merge regardless of which & which order app 
adds fields to docs.

bq. Some of the config settings (such as maxBufferedDocs) can simply be removed 
from DW, and instead accessed via WriterConfig

Ahh, you mean push IWC down to DW?  That sounds great.

> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
>                 Key: LUCENE-2312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2312
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Jason Rutherglen
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable. 
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing. 
> Michael Busch has good suggestions regarding how to handle deletes using max 
> doc ids.  
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here: 
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer

Reply via email to