Thanks Adrien for the background IndexSortSortedNumericDocValuesRangeQuery is a neat idea! I imagine the logs use-case where every search has a filter makes this optimization important.
In https://github.com/apache/lucene-solr/pull/715 the benchmark indexed 123M docs. The results for - *range with single point [897303051, 897303051], 124 docs *showed a slight slowdown over what we have originally. However the matching documents were very small compared to the total docs. I created another dataset locally where I indexed 5M docs with 10 different unique values for the filtering field. *Query 1:* Query longPointFq = LongPoint.newExactQuery("category", 1); *Query 2:* Query fallbackQuery = SortedNumericDocValuesField.newSlowRangeQuery("category_dv", 1, 1); IndexSortSortedNumericDocValuesRangeQuery optimizedFq = new IndexSortSortedNumericDocValuesRangeQuery("category_dv", 1, 1, fallbackQuery); Ran each query 1000 times and recorded the total time Query 1 took 3300ms Query 2 took 150ms The numbers were pretty consistent on running it a couple of times. Curious to hear your thoughts on trying to use this optimization for exact queries as well On Thu, Mar 5, 2020 at 7:59 AM Adrien Grand <jpou...@gmail.com> wrote: > We don't directly take advantage of index sort in this case, but index > sorting still makes this faster. I had mentioned it in a presentation a > couple years ago > https://speakerdeck.com/elastic/get-the-lay-of-the-lucene-land-1?slide=14: > querying geonames for TYPE:CITY AND CONTRY_CODE_US ran 1.6x faster when the > index is sorted by TYPE then CONTRY_CODE. > > There are two contributing factors to it. The first one is that postings > are cheaper to decode, because they consist of long range of doc IDs that > increment by 1. The second is that having filters that match dense range of > doc IDs is a better case for ConjunctionDISI than combining iterators whose > doc IDs are interleaved. > > We have a single query that takes advantage of index sorting explicitly to > my knowledge: IndexSortSortedNumericDocValuesRangeQuery. This query runs > range queries on numbers using doc values by binary searching the doc IDs > that map to the start and the end of the interval. > > On Thu, Mar 5, 2020 at 12:56 AM Varun Thacker <va...@vthacker.in> wrote: > >> If I have an index sorted by category and at search time filter on one >> category >> >> Do we currently take advantage of index sort for this sort of a filter >> query? >> >> > > -- > Adrien >