Sorry, I guess I point out a wrong java class name. I want to be confirmed that if SegmentMerger.java in Lucene do dedup or not. I tracing down couple of java class from SegmentMerger.java, such as, SegmentReader.java, IndexWriter.java, SegmentReader.java. I didn't see a dedup mechanism yet.
thanks, Micheal Ji, --- Yonik Seeley <[EMAIL PROTECTED]> wrote: > Sorry, I've only briefly looked at Nutch, so you > should ask on that mailing > list. > Lucene doesn't do deduping. > > > -Yonik > Now hiring -- http://tinyurl.com/7m67g > > On 10/14/05, Michael Ji <[EMAIL PROTECTED]> wrote: > > > > hi Yonik: > > > > Does that mean when two documents has same MD5 > content > > in two different segments, IndexMerger.java will > keep > > both of them? > > > > When I look at the code of IndexSegment.java, it > > handle MD5 dedupling by keeping the one with > higher > > document ID. > > > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]