Are there any best practices for constructing Filters to search efficiently?
From my non-exhaustive experiments I cannot intuit how to construct my filters
to achieve best performance.

I have an index (Lucene 4.3) of about 1.8M documents which contain a field
acting as a flag (evidence:true). Initially all the documents I am interested in
searching have this field. Later as the index grows some documents will not have
this field.

In the simplest case I want to filter on documents with evidence:true. Running a
couple of hundred queries sequentially and recording how long it takes to
complete.

 * No filter: ~40s
 * QueryWrapperFilter(TermQuery(evidence:true)): ~80s
 * FieldValueFilter(evidence): ~43s
 * TermsFilter(evidence:true): ~50s

This suggests QWF is a bad idea.

A more complex filter is:

  (evidence:true AND (cid:x OR cid:y ...) AND language:eng)

Where 1.8M documents evidence:true, 2-4 documents per cid clause, 1-60 cid
clauses, and 1.4M documents lang:eng.

Our initial implementation uses QWF of a BooleanQuery(TQ AND BQ(OR) AND TQ)
which takes ~210s.

Adjusting this to be a BooleanFilter(TermsFilter AND TermsFilter AND
TermsFilter) sees things slow down to ~239s!

Any advice on optimizing these filters would be appreciated!

James


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to