Sounds interesting Marvin, I would be willing to test out what you create. I am working on trying creating a rapidly updating index and it sounds like this may help that. I've noticed even using a ramdisk that the whole merging process is quite slow. Maybe also because of the locking that occurs the CPU is not maxed out either. Seems like there is a lot of room for optimization. Cheers.
----- Original Message ---- From: Marvin Humphrey <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, September 6, 2006 11:35:59 AM Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided) On Sep 6, 2006, at 10:30 AM, Yonik Seeley wrote: > So it looks like you have intermediate things that aren't lucene > segments, but end up producing valid lucene segments at the end of a > session? That's one way of thinking about it. There's only one "thing" though: a big bucket of serialized index entries. At the end of a session, those are sorted, pulled apart, and used to write the tis, tii, frq, and prx files. Everything else (e.g. stored fields) gets written incrementally as documents get added. The fact that stored fields don't get shuffled around is one of this algorithm's advantages (along with much lower memory requirements, etc). > For Java lucene, I think the biggest indexing gain could be had by not > buffering using single doc segments, but something optimized for > in-memory single segment creation. In theory, you could apply this technique only to a limited number of docs and create segments, say, 10 docs at a time rather than 1 at a time. But then you still have to do something with each 10 doc segment, and you don't get the benefits of less disk shuffling and lower RAM usage. Better to just create 1 segment per session. > The downside is complexity... two > sets of "merge" code. KS doesn't have SegmentMerger. :) > It would be interesting to see an IndexWriter2 for full Gordian Knot > cutting like you do :-) I've already contributed a Java port of KinoSearch's external sorter (along with its tests), which is the crucial piece. The rest isn't easy, but stay tuned. ;) Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]