On Monday 12 October 2009 14:53:45 Christoph Boosz wrote:
> Hi,
> 
> I have a question related to faceted search. My index contains more than 1
> million documents, and nearly 1 million terms. My aim is to get a DocIdSet
> for each term occurring in the result of a query. I use the approach
> described on
> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html<https://service.gmx.net/de/cgi/derefer?TYPE=3&DEST=http%3A%2F%2Fsujitpal.blogspot.com%2F2007%2F04%2Flucene-search-within-search-with.html>,
> where a BitSet is built out of a QueryFilter for each term and intersected
> with the BitSet representing the user query.
> However, performance could be better. I guess it’s because the term filter
> considers each document in the index, even if it’s not in the result. My
> attempt to use a ChainedFilter, where the first filter (cached) is for the
> user query, and the second one for the term (done for all terms), didn’t
> speed things up, though.
> Am I missing something? Is there a better way to get the DocIdSets for a
> huge number of terms in a limited set of documents?

Assuming you only need the number of documents within the original query
that contain each term, one thing that can be saved is the allocation of the
resulting BitSet for each term. To do this, use the cached BitSet (or the
OpenBitSet in current lucene) for the original Query as a filter for a TermQuery
per term, and then count the matching documents by using a counting
HitCollector on the IndexSearcher.

Regards,
Paul Elschot

Reply via email to