Re: Parallel incremental indexing

Yonik Seeley Sun, 30 Aug 2009 06:08:50 -0700

Cool stuff!

We should also think about how to do single document field updates or
field adds since that is the most common usecase - not that it needs
to be implemented in the first version, but kept in mind so we don't
box ourselves in.


Doug mentioned some ideas he had in passing almost a year ago about
how to add a field to a single document, and it is similar in that it
used parallel reader.  IndexWriter would be modified to maintain the
same structure across parallel indexes, as you note.  If one wanted to
add a new field value to document 1000, one would have to index dummy
documents for docs 0-999... instead of this, the index format should
support gaps.  On a segment merge, the IndexWriter could simply merge
in this new segment.

Anyway, updateable documents is fundamental enough, we should also
consider changes to the index format if it makes it easer.

-Yonik
http://www.lucidimagination.com


On Sun, Aug 30, 2009 at 2:23 AM, Michael Busch<[email protected]> wrote:
> Hi all,
>
> I just added a wiki page for a new feature I'd like to add to
> Lucene. Please take a look at the link. I will add more details and
> diagrams to the page, but for now it should give a rough idea about
> how to implement it:
>
> http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing
>
> Basically the idea is to allow updating documents partially, e.g. only
> a subset of the fields without having to reindex the entire
> document. This is a feature that is very often asked for.
>
> We have implemented the solution in IBM and it's working
> great. It is a technology that allowed us already to add really exciting
> new features to products that weren't easily possible before.
>
> The implementation I can currently contribute has some limitations:
> e.g. multi-threaded indexing is not supported. But let me make clear
> that this is not a limitation of the design described in the wiki - we
> have these limitations because we implemented this on top of Lucene's 2.4
> APIs. If we decide to add this to Lucene's core we should
> reimplement some parts to overcome those limitations.
>
> In my opinion this will be a great addition to Lucene that many
> people will find very useful. In Solr this is also something users often
> ask for.
>
> In the last weeks I worked on getting internal approval for the contribution
> to Lucene and the good news is that I already have a signed
> software grant ready - so if the community likes this feature and
> decides to add this to Lucene there won't be any delay for legal work
> from IBM's side.
>
> Btw: I will be on vacation from 09/03-09/20 and won't have internet
> access most of the time, so if I stop responding end of next week you'll
> know why...
>
> Please let me know what you think!
>
>  Michael
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Parallel incremental indexing

Reply via email to