Any tips on this issue?

Thanks

Marco
  ----- Original Message ----- 
  From: Marco Dissel 
  To: java-user@lucene.apache.org 
  Sent: Friday, May 13, 2005 9:05 AM
  Subject: finding potential duplicate documents


  Hello

  I've got many documents that are potentially duplicate (merging several 
external systems). Any tips how to find documents that are potentially 
duplicate (using a variable ranking like >0.5 match).. 

  I can use the similarity (MoreLikeThis) method from Sandbox, but that's 
always comparing one document with the index. Is there a way to give back all 
the potential duplicate documents in the index without interating every 
document in the index and compare it with the other documents in the index.

  Thanks
  Marco


  ---------------------------------------------------------------------
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to