On Mon, 2019-02-11 at 13:42 +0100, Daniel Penning wrote: > I think sorting and searching on docvalues can benefit from similar > optimizations like impacts do when scoring is used. By storing the > minimum and maximum value for blocks of docvalues it would be > possible to skip over sets of documents that don't contain any values > in the relevant range.
Nice idea: The docID-indexed blocks are handled by IndexedDISI, where the blocks represents 65536 docs each. It would be simple enough to enrich these with min & max, although it would mean 16 bytes extra for each block, where the current minimum is 0 bytes + 8 bytes for the jump table entry. It could be made into an option of course... As Alan suggests, the jump tables would hold the values for maximum skippiness. I unsure how much API-change it would require for the values to be used. Adding a method such as nextDocIDWhereNumericMightBeHigherThan() seems awfully specialized and might require knowledge of the block size used inside of IndexedDISI, which is implementation specific. I do like the general idea of "can we somehow skip ahead?". Another sore point if how much this would statistically help. What are the chances that a block can be skipped at all? Of course it depends on the distribution of the values themselves, so sorting descending on a long tail distribution might work very well, whereas a more uniform distribution would result in no speed-up. A date field in a time-series corpus, indexed in chronological order, would work extremely well sorting ascending and extremely poorly descending (possible it's the other way around. It's getting late here). So yeah, it is definitely technically possible and a very interesting idea. And maybe very specialized? Cc: to Daniel as the thread is a month old. - Toke Eskildsen, Royal Danish Library --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org