Hi Team, I was discussing this problem with Greg Miller (also at Amazon Product Search):
If I want to make a query that filters out a few primary keys (ASIN in our Amazon Product Search world), I can make a TermInSetQuery and add it as a MUST_NOT onto a BooleanQuery that has all the other interesting clauses for my query. But if I have many, many ASINs to filter out, at some point it may become more efficient to just use doc values and filter them out like Solr's "post-filter" / during collection, e.g. by loading the BINARY value or SORTED (globalized) ordinal, and checking e.g. a HashSet to see if it should be skipped. Not using the inverted index at all... Do we already have such a "slow DV TermInSet" query? It seems like it could belong in SortedDocValues where we already have newSlowRangeQuery, newSlowExactQuery, we could add a newSlowInSetQuery? And then we could make an IndexOrDocValuesQuery with both the TermInSetQuery and this SDV.newSlowInSetQuery? Or maybe there is already a good way to do this in Lucene? Thanks!, Mike McCandless http://blog.mikemccandless.com