Hi,
Reading the code for LinkDb.reduce(): if we have page duplicates in
input segments, or if we have two copies of the same input segment, we
will create the same Inlink values (satisfying Inlink.equals()) multiple
times. Since Inlinks is a facade for List, and not a Set, we will get
duplicate Inlink-s in Inlinks (if you know what I mean ;).
The problem is easy to test: create a new linkdb based on 2 identical
segments. This problem also makes it more difficult to properly
implement LinkDB updating mechanism (i.e. incremental invertlinks).
I propose to change Inlinks to use a Set semantics, either explicitly by
using a HashSet or implicitly by checking if a value to be added already
exists.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com