At 11:26 AM +0200 3/6/00, Valdas Andrulis wrote:
>logs are attached.

Thanks very much. I didn't have much time to look at your logs until 
now. Ooo, that's a nasty bug! So far you're finding the best ones. :-)

Here's the problem. You reindex and it deletes the old document. So 
far, so good. Except when it goes to delete the entry in 
db.docs.index, it's keyed by URL. So it deletes the new entry (see 
below). Whoops! Now on another reindexing run, it can't find an entry 
for that URL in the db.docs.index and grabs it again, completely 
ignoring the entry in the actual database. Voila! Duplicate records.

The reason it deletes the new entry in db.docs.index is because it's 
a DB_HASH. This doesn't support duplicates, so when you've added in 
the entry for the new document, you've overwritten the previous entry 
already.

In short, when we do a delete, we need to check to make sure the 
URL->ID pair is the same as the one we're removing!

Thanks very much Valdas!
-Geoff

%DocumentDB.patch

DocumentDB.patch

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to