On Nov 16, 2007, at 11:59 AM, Antoine Baudoux wrote:

        I'm trying to implement a similar solution.


Could you be more precise on how you handle duplicates, as well as document deletion?

The key probably is (it was for us, anyway) that you have a fast way of determining whether or not a given document is in an index. We use (and John et al, too, I suppose) the unique id (!= doc id) each document has for that purpose. The basic idea for that should be in the archives.

So, back to the question:
By definition anything in the RAM index is newer than anything on disk, so documents found in the RAM index should supersede docs from disk when they have the same unique id (user id, primary key, whatever). When you have the hits of the query you can easily determine duplicate primary keys, and for those you look up from which index they came (by asking an enhanced MultiReader that knows the indices and their doc id ranges). The last operation obviously has to be very fast, thus we use out custom id => docid mapping mechanism (and I think John is using his own, too).

There are probably even more clever ways of doing this, but it should give you an idea. :)

cheers,
-k

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to