I've put together a new Filter and Junit test for eliminating duplicates
from search results.
The typical usage scenario is where multiple documents exist in the
index which share an untokenized field value (e.g. the same primary key
or URL). It is desirable to keep copies in the index because some
searches wish to see the multiple versions (e.g. to view a revision
history for a document). However, when a search is done which needs to
return only one version of each document (often the latest version) this
filter can be used as an efficient means of filtering results. The
bitset produced marks ALL the "master" docs in an index for a field and
this filter can be safely cached for reuse with any query
DuplicateFilter df=new DuplicateFilter(KEY_FIELD_NAME);
df.setKeepMode(DuplicateFilter.KM_USE_LAST_OCCURRENCE);
Hits h = searcher.search(query,df);
If anyone else finds this useful I'll commit it.
Cheers
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]