Are there any best practices for constructing Filters to search efficiently? From my non-exhaustive experiments I cannot intuit how to construct my filters to achieve best performance.
I have an index (Lucene 4.3) of about 1.8M documents which contain a field acting as a flag (evidence:true). Initially all the documents I am interested in searching have this field. Later as the index grows some documents will not have this field. In the simplest case I want to filter on documents with evidence:true. Running a couple of hundred queries sequentially and recording how long it takes to complete. * No filter: ~40s * QueryWrapperFilter(TermQuery(evidence:true)): ~80s * FieldValueFilter(evidence): ~43s * TermsFilter(evidence:true): ~50s This suggests QWF is a bad idea. A more complex filter is: (evidence:true AND (cid:x OR cid:y ...) AND language:eng) Where 1.8M documents evidence:true, 2-4 documents per cid clause, 1-60 cid clauses, and 1.4M documents lang:eng. Our initial implementation uses QWF of a BooleanQuery(TQ AND BQ(OR) AND TQ) which takes ~210s. Adjusting this to be a BooleanFilter(TermsFilter AND TermsFilter AND TermsFilter) sees things slow down to ~239s! Any advice on optimizing these filters would be appreciated! James --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org