[
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845061#action_12845061
]
Michael McCandless commented on LUCENE-2312:
--------------------------------------------
bq. IW commitMerge calls docWriter's remapDeletes, a synchronized method to
prevent concurrent updates. I'm not sure how we should efficiently block calls
to the different DW's.
Yeah this is because when we buffer a delete Term/Query, the docID we store
against it is absolute. It *seems* like it could/should be relative (ie,
within the RAM segment), then remapping wouldn't be needed when a merge
commits. I think?
bq. _mergeInit calls docWriter getDocStoreSegment - unsure what to change
It wouldn't anymore once we have private RAM segments: we would no longer share
doc stores across segments, meaning merging will always merge doc stores and
there's no need to call that method nor have all the logic in SegmentMerger to
determine whether doc store merging is required.
This will necessarily be a perf hit when up and building a large index from
scratch in a single IW session. Today that index creates one large set of doc
stores and never has to merge it while building. This is the biggest perf
downside to this change, I think.
But maybe the perf loss will not be so bad, because of bulk merging, in the
case when all docs always add the same fields in the same order. Or... if we
could fix lucene to always bind the same field name to the same field number
(LUCENE-1737) then we'd always bulk-merge regardless of which & which order app
adds fields to docs.
bq. Some of the config settings (such as maxBufferedDocs) can simply be removed
from DW, and instead accessed via WriterConfig
Ahh, you mean push IWC down to DW? That sounds great.
> Search on IndexWriter's RAM Buffer
> ----------------------------------
>
> Key: LUCENE-2312
> URL: https://issues.apache.org/jira/browse/LUCENE-2312
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Affects Versions: 3.0.1
> Reporter: Jason Rutherglen
> Assignee: Michael Busch
> Fix For: 3.1
>
>
> In order to offer user's near realtime search, without incurring
> an indexing performance penalty, we can implement search on
> IndexWriter's RAM buffer. This is the buffer that is filled in
> RAM as documents are indexed. Currently the RAM buffer is
> flushed to the underlying directory (usually disk) before being
> made searchable.
> Todays Lucene based NRT systems must incur the cost of merging
> segments, which can slow indexing.
> Michael Busch has good suggestions regarding how to handle deletes using max
> doc ids.
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
> The area that isn't fully fleshed out is the terms dictionary,
> which needs to be sorted prior to queries executing. Currently
> IW implements a specialized hash table. Michael B has a
> suggestion here:
> https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]