I haven't tried it, but according to http://lucene.apache.org/java/ docs/fileformats.html, each segment is a complete sub index. I _wonder_ if you couldn't manage your own merges by using IndexWriter.addIndexes() where you load each segment in separately (this may mean copying the segments to other directories, but I am not sure). Another option would be to modify Lucene to expose the merge functionality.

This is pure speculation at this point, but I know the capabilities exist (as all optimize does is merge segments until there is one segment) so it seems like it should be possible.

-Grant

On Dec 1, 2006, at 8:11 AM, Stanislav Jordanov wrote:

Guys,

I've already asked this question but nobody answered:

Suppose we have a relatively big index which is continuously updated - i.e. new docs get added while some of the old docs get deleted. For pragmatic reasons we have a restriction on maxMergeDocs so that segment files don't get enormously big. Consider now a segment of max size (i.e. containing maxMergeDocs docs hence not eligible for a merge) It is possible that (as time passes) this segment will have more and more of its docs deleted. But as it is not merge-able it will remain the same size and with lots of "wholes" in it which is bad for performance. The only way that I am aware of to correct this problem is to invoke index optimization, which has several drawbacks:
1. it takes a while to optimize a big index.
2. the optimization process always produces a index comprising of a single (extremely) large segment.
We can live with 1.
But 2 is undesirable.
Is there a way to "optimize" (in terms of purging its deleted docs) an index or a single segment
without ending up with a single segment index?

Best,
Stanislav

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to