DuplicatesFilter - one for contrib?

markharw00d Sun, 30 Sep 2007 13:47:59 -0700

I've put together a new Filter and Junit test for eliminating duplicatesfrom search results.

The typical usage scenario is where multiple documents exist in theindex which share an untokenized field value (e.g. the same primary keyor URL). It is desirable to keep copies in the index because somesearches wish to see the multiple versions (e.g. to view a revisionhistory for a document). However, when a search is done which needs toreturn only one version of each document (often the latest version) thisfilter can be used as an efficient means of filtering results. Thebitset produced marks ALL the "master" docs in an index for a field andthis filter can be safely cached for reuse with any query


       DuplicateFilter df=new DuplicateFilter(KEY_FIELD_NAME);
       df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE);
       Hits h = searcher.search(query,df);


If anyone else finds this useful I'll commit it.

Cheers
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DuplicatesFilter - one for contrib?

Reply via email to