Sorry, I've only briefly looked at Nutch, so you should ask on that mailing
list.
Lucene doesn't do deduping.


-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 10/14/05, Michael Ji <[EMAIL PROTECTED]> wrote:
>
> hi Yonik:
>
> Does that mean when two documents has same MD5 content
> in two different segments, IndexMerger.java will keep
> both of them?
>
> When I look at the code of IndexSegment.java, it
> handle MD5 dedupling by keeping the one with higher
> document ID.
>

Reply via email to