On 14. Jan 2007, at 8:51 , Doron Cohen wrote:
I think that one effective way to control docids changes, assuming
delete/update rate significantly lower than add rate, is to modify
Lucene
such that deleted docs are only 'squeezed out' when calling optimize
().
This would involve delicate changes in the merging code, but is
possible.
Then, once there are 'too many' deletions, the application could call
optimize().
We will have a high delete/update to add ratio, i.e. we will change a
lot, but not
add new documents that often. Other than that, I will give that
suggestion some thought,
I can live with the index size growing until we do an optimize.
This way, having full control on when deleted docs are 'squeezed',
and also
knowing which docs these are (same docs that same app deleted
during last X
hours) - that application can at that point update the mapping between
Lucene IDs and the database IDs, again, knowing that Lucene IDs are
set -
deterministically - by the order of adding docs.
This would allow - as Erick mentioned earlier in this thread - to
create
the filter from the database only, no need to query Lucene for
that. You
would probably need to copy that table so existing table can be
still used
by searchers referencing the index before optimize() was called, at
least
until db table is updated and some index warming is done.
Yes, I like this idea a lot - tracking the doc ids and saving them.
It would
allow for a speedy bitset creation. I'm not totally sure it can be
done though.
I am not sure that I am happy with this direction, just wanted to
point out
the possibility. Would have been convenient for this if Lucene's
writer had
an option like "keepDeletions" or something, though I am not sure
yet if
this can be implemented without too much complication of the code,
or if
this is general enough to be in the API.
I'm also not sure this would be a good addition to a stock Lucene,
since it is a
really special requirement. But it would be cool to have something
like this in case
someone wants to do weird things, as we do ;)
Thanks a bunch!
cheers,
-k
--
Kay Röpke
http://classdump.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]