Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Marvin Humphrey Wed, 06 Sep 2006 11:36:29 -0700


On Sep 6, 2006, at 10:30 AM, Yonik Seeley wrote:

So it looks like you have intermediate things that aren't lucene
segments, but end up producing valid lucene segments at the end of a
session?

That's one way of thinking about it. There's only one "thing"though: a big bucket of serialized index entries. At the end of asession, those are sorted, pulled apart, and used to write the tis,tii, frq, and prx files.

Everything else (e.g. stored fields) gets written incrementally asdocuments get added. The fact that stored fields don't get shuffledaround is one of this algorithm's advantages (along with much lowermemory requirements, etc).

For Java lucene, I think the biggest indexing gain could be had by not
buffering using single doc segments, but something optimized for
in-memory single segment creation.

In theory, you could apply this technique only to a limited number ofdocs and create segments, say, 10 docs at a time rather than 1 at atime. But then you still have to do something with each 10 docsegment, and you don't get the benefits of less disk shuffling andlower RAM usage. Better to just create 1 segment per session.

The downside is complexity... two
sets of "merge" code.


KS doesn't have SegmentMerger.  :)

It would be interesting to see an IndexWriter2 for full Gordian Knot
cutting like you do :-)

I've already contributed a Java port of KinoSearch's external sorter(along with its tests), which is the crucial piece. The rest isn'teasy, but stay tuned. ;)


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Reply via email to