[jira] Commented: (LUCENE-1879) Parallel incremental indexing

Grant Ingersoll (JIRA) Fri, 26 Mar 2010 12:42:49 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850322#action_12850322
 ]


Grant Ingersoll commented on LUCENE-1879:
-----------------------------------------

First off, I haven't looked at the code here or the comments beyond skimming, 
but this is something I've had in my head for a long time, but don't have any 
code.  When I think about the whole update problem, I keep coming back to the 
notion of Photoshop Layers that essentially mask the underlying part of the 
photo, w/o damaging it.  The analogy isn't quite the same here, but 
nevertheless...

This leads me to wonder if the solution isn't best achieved at the index level 
and not at the Reader/Writer level.  

So, thinking out loud here and I'm not sure on the best wording of this:  
when a document first comes in, it is all in one place, just as it is now.  
Then, when an update comes in on a particular field, we somehow mark in the 
index that the document in question is modified and then we add the new change 
onto the end of the index (just like we currently do when adding new docs, but 
this time it's just a doc w/ a single field).    Then, when searching, we 
would, when scoring the affected documents, go to a secondary process that knew 
where to look up the incremental changes.  As background merging takes place, 
these "disjoint" documents would be merged back together.  We'd maybe even 
consider a "high update" merge scheduler that could more frequently handle 
these incremental merges.  In a sense, the old field for that document is 
masked by the new field.  I think, given proper index structure, that we 
_maybe_ could make that marking of the old field fast (maybe it's a pointer to 
the new field, maybe it's just a bit indicating to go look in the "update" 
segment)

On the search side, I think performance would still be maintained b/c even in 
high update envs. you aren't usually talking about more than a few thousand 
changes in a minute or two and the background merger would be responsible for 
keeping the total number of disjoint documents low.

> Parallel incremental indexing
> -----------------------------
>
>                 Key: LUCENE-1879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1879
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>             Fix For: 3.1
>
>         Attachments: parallel_incremental_indexing.tar
>
>
> A new feature that allows building parallel indexes and keeping them in sync 
> on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
> Find details on the wiki page for this feature:
> http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
> Discussion on java-dev:
> http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

Reply via email to