At 11:26 AM +0200 3/6/00, Valdas Andrulis wrote:
>logs are attached.
Thanks very much. I didn't have much time to look at your logs until
now. Ooo, that's a nasty bug! So far you're finding the best ones. :-)
Here's the problem. You reindex and it deletes the old document. So
far, so good. Except when it goes to delete the entry in
db.docs.index, it's keyed by URL. So it deletes the new entry (see
below). Whoops! Now on another reindexing run, it can't find an entry
for that URL in the db.docs.index and grabs it again, completely
ignoring the entry in the actual database. Voila! Duplicate records.
The reason it deletes the new entry in db.docs.index is because it's
a DB_HASH. This doesn't support duplicates, so when you've added in
the entry for the new document, you've overwritten the previous entry
already.
In short, when we do a delete, we need to check to make sure the
URL->ID pair is the same as the one we're removing!
Thanks very much Valdas!
-Geoff
%DocumentDB.patch
DocumentDB.patch
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.