On 14. Jan 2007, at 8:51 , Doron Cohen wrote:


I think that one effective way to control docids changes, assuming
delete/update rate significantly lower than add rate, is to modify Lucene such that deleted docs are only 'squeezed out' when calling optimize (). This would involve delicate changes in the merging code, but is possible.
Then, once there are 'too many' deletions, the application could call
optimize().

We will have a high delete/update to add ratio, i.e. we will change a lot, but not add new documents that often. Other than that, I will give that suggestion some thought,
I can live with the index size growing until we do an optimize.

This way, having full control on when deleted docs are 'squeezed', and also knowing which docs these are (same docs that same app deleted during last X
hours) - that application can at that point update the mapping between
Lucene IDs and the database IDs, again, knowing that Lucene IDs are set -
deterministically - by the order of adding docs.

This would allow - as Erick mentioned earlier in this thread - to create the filter from the database only, no need to query Lucene for that. You would probably need to copy that table so existing table can be still used by searchers referencing the index before optimize() was called, at least
until db table is updated and some index warming is done.

Yes, I like this idea a lot - tracking the doc ids and saving them. It would allow for a speedy bitset creation. I'm not totally sure it can be done though.

I am not sure that I am happy with this direction, just wanted to point out the possibility. Would have been convenient for this if Lucene's writer had an option like "keepDeletions" or something, though I am not sure yet if this can be implemented without too much complication of the code, or if
this is general enough to be in the API.

I'm also not sure this would be a good addition to a stock Lucene, since it is a really special requirement. But it would be cool to have something like this in case
someone wants to do weird things, as we do ;)

Thanks a bunch!

cheers,

-k

--
Kay Röpke
http://classdump.org/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to