On Nov 16, 2007, at 11:59 AM, Antoine Baudoux wrote:
I'm trying to implement a similar solution.
Could you be more precise on how you handle duplicates, as well as
document deletion?
The key probably is (it was for us, anyway) that you have a fast way
of determining whether or not a given document is in an index.
We use (and John et al, too, I suppose) the unique id (!= doc id) each
document has for that purpose. The basic idea for that should be in
the archives.
So, back to the question:
By definition anything in the RAM index is newer than anything on
disk, so documents found in the RAM index should supersede docs from
disk when they have the same unique id (user id, primary key, whatever).
When you have the hits of the query you can easily determine duplicate
primary keys, and for those you look up from which index they came (by
asking an enhanced MultiReader that knows the indices and their doc id
ranges). The last operation obviously has to be very fast, thus we use
out custom id => docid mapping mechanism (and I think John is using
his own, too).
There are probably even more clever ways of doing this, but it should
give you an idea. :)
cheers,
-k
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]