Re: realtime indexing

Kay Röpke Fri, 16 Nov 2007 09:10:11 -0800


On Nov 16, 2007, at 11:59 AM, Antoine Baudoux wrote:

        I'm trying to implement a similar solution.
Could you be more precise on how you handle duplicates, as well asdocument deletion?

The key probably is (it was for us, anyway) that you have a fast wayof determining whether or not a given document is in an index.We use (and John et al, too, I suppose) the unique id (!= doc id) eachdocument has for that purpose. The basic idea for that should be inthe archives.


So, back to the question:

By definition anything in the RAM index is newer than anything ondisk, so documents found in the RAM index should supersede docs fromdisk when they have the same unique id (user id, primary key, whatever).When you have the hits of the query you can easily determine duplicateprimary keys, and for those you look up from which index they came (byasking an enhanced MultiReader that knows the indices and their doc idranges). The last operation obviously has to be very fast, thus we useout custom id => docid mapping mechanism (and I think John is usinghis own, too).

There are probably even more clever ways of doing this, but it shouldgive you an idea. :)


cheers,
-k

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: realtime indexing

Reply via email to