A bit more info:
The addLink documentation: "Links are only permitted in the webdb if they
have a valid source MD5 for a Page that is also in the webdb". Yet I can
insert a link with the MD5 of a page that is not in the webdb.
Also, I can now filter out the offending links by reading both the pages and
the links by MD5, adding the following (seemingly missing) method to the
WebDBReader class:
/**
* Iterate through all the Links, sorted by MD5
*/
public Enumeration linksByMD5() throws IOException {
return new MapEnumerator(linksByMD5);
}
> -----Mensaje original-----
> De: Handl, Jorge [mailto:[EMAIL PROTECTED]
> Enviado el: Lunes, 05 de Septiembre de 2005 16:51
> Para: [email protected]
> Asunto: linksByMD5
>
>
> Hi!
>
> I'm writing a webdb purger, and I have an issue with writing
> to the new db
> the links of the pages that haven't been purged.
>
> The docs seem to imply that adding a link having a source
> page that is not
> present in the webdb should fail, but apparently it doesn't.
>
> So I try to filter out the links that shouldn't be inserted,
> but I can't
> access the links by MD5, even though I find both linksByURL
> and linksByMD5
> directories in the webdb... Why is that so?
>
> Thanks!
>