sgup432 opened a new pull request, #15793: URL: https://github.com/apache/lucene/pull/15793
### Description <!-- If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide. https://github.com/apache/lucene/blob/main/CONTRIBUTING.md --> Related issue for more details - https://github.com/apache/lucene/issues/15770 - This PR adds MultiFieldDocValuesRangeQuery, which coordinates DocValuesSkipper evaluation across fields. BooleanQuery.rewrite() detects the pattern (2+ required NumericDocValuesRangeQuery clauses on distinct fields) and replaces them with a single coordinated query. - MultiFieldDocValuesRangeQuery contains Concatenated iterator where the main logic lies. It work together with all the desired fields docValueSkipper and move them together. - Also contains a jmh benchmark to validate this. - Tested across different data patterns, document counts, and number of concurrent range fields. ## JMH Benchmark Results | Pattern | Docs | Fields | Without Optimization | With (Run 1) | With (Run 2) | Speedup | |-----------|------|--------|---------------------|--------------|--------------|---------| | clustered | 1M | 3 | 16,417 | 60,758 | 61,342 | **3.7x** | | clustered | 1M | 5 | 11,523 | 55,922 | 57,487 | **5.0x** | | clustered | 10M | 3 | 16,148 | 54,827 | 55,677 | **3.4x** | | clustered | 10M | 5 | 13,128 | 40,920 | 42,154 | **3.2x** | | mixed | 1M | 3 | 859 | 836 | 1,001 | **1.17x** | | mixed | 1M | 5 | 514 | 706 | 873 | **1.70x** | | mixed | 10M | 3 | 76 | 79 | 79 | **1.03x** | | mixed | 10M | 5 | 50 | 65 | 69 | **1.38x** | | random | 1M | 3 | 62 | 65 | 68 | **1.10x** | | random | 1M | 5 | 45 | 65 | 64 | **1.42x** | | random | 10M | 3 | 4.3 | 6.4 | 6.5 | **1.51x** | | random | 10M | 5 | 3.5 | 5.7 | 5.8 | **1.65x** | | sorted | 1M | 3 | 920 | 881 | 841 | 0.91x | | sorted | 1M | 5 | 611 | 711 | 882 | **1.44x** | | sorted | 10M | 3 | 69 | 75 | 78 | **1.14x** | | sorted | 10M | 5 | 55 | 67 | 68 | **1.22x** | **Data Pattern:** - **clustered**: All field values increase with docID (e.g., time-series data where timestamp, sequence number, and sensor readings grow together). Narrow query ranges eliminate most blocks. Best case for coordination (3.2–5.0x). - **mixed**: Combination of monotonic (timestamp), low-cardinality (20 values, like order status), and random fields (price). Resembles e-commerce order filtering. Moderate gains (1.2–1.7x). - **sorted**: Index sorted by one field (timestamp), other fields random. Resembles time-series indexed by ingestion time but queried on unsorted metric fields. Similar to mixed (1.1–1.4x). - **random**: All fields uniformly random with wide query ranges. Worst case, but still gains (1.1–1.7x) — when one field eliminates a block, it saves checking all others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
