According to Patrice BAUMANN:
> Thank you for your answer ; I shall look for the cause of the problem. I
> have another question : in which file can I find the different url of the
> same document htdig store (doc.db, doc.index, doc.list...) and is this
> relevant for my problem ?

Yes, it's relevant for your problem in that you want to be able to see
what different URLs point to the same document, so you can figure out what
the best solution is to the problem.

However, there is no systematic or automatic way of getting a list of these.
If htdig were smart enough to do this, it would be smart enough to supress
the duplicates.  In fact, in the 3.2 beta of htdig, there's an experimental
technique in the code that uses MD5 checksums to attempt to locate and
suppress duplicates.  Last I heard, it still had some problems (e.g. not
finding the "best" path to a document, missing documents that are "almost"
identical).

I think the best you can do is try some searches on your database and see
where duplicates come up.  When they do, take a close look at the URLs of
these duplicate documents and see how they differ, and how they can be
classified into the different cases I described previously.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to