On 11/9/09 5:40 PM, Michael Busch wrote:
I think that should be ok with parallel indexing, as long as we can
always select all corresponding segments from *all* parallel indexes
for a merge to keep the docIds in sync.
That actually leads me to another question: Let's say you have three
segments a, b, c. b and c share the same doc store. You perform
deletes on a and b. Then you call expungeDeletes(). Normally that call
should only merge a and b, because c doesn't have any deletes. But b
and c have to participate in the same merge, because they share the
same doc store, right? So would it merge all three segments?
If that's the case (that b and c must be part of the same merge) then
it would make the parallel indexing more difficult. The reason is that
if two parallel indexes 1 and 2 can decide on their own how to share
e.g. doc stores across segments, then we might come into a situation
where 1a and 1b share the same doc store, and 2b and 2c share the same
doc store. Then if index 1 needs to merge 1a and 1b, it can't assume
that this merge is allowed. There would have to be someone on top of
the whole thing who decides that all three segments need to be merged
at the same time, because b is connected to a and c in the two
parallel indexes. I wouldn't like such a restriction very much.
We could think about allowing merges like ab->d, even if b,c share the
same doc store. That would mean to copy the b part of the shared bc
doc store into the new segment d. Then until c gets deleted the stored
docs of b would be on disk twice and require more disk space temporarily.
I think this is exactly what happens? I wrote a small test program that
creates a situation like mentioned above in the "expungeDelete"
scenario. It ends up with a docstore containing docs from two segments,
but after expungeDeletes only one segment references the docstore. The
non-deleted docs from the other segment end up in a new segment, so they
are twice on disk (once orphaned in the old docstore, once in the new
segment).
Is that the desired behavior?
Michael
Well maybe there is already a solution for all this in the code and
I'm just not aware of it?
Michael
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org