xiangfu0 commented on PR #17872:
URL: https://github.com/apache/pinot/pull/17872#issuecomment-4115993260
## Updated JMH Benchmark Results
**Setup:** 1M docs, 4 cardinalities × 5 selectivities × 3 paths,
`@Fork(value=2, warmups=0)`, `@Warmup(iterations=2)`,
`@Measurement(iterations=3)`, JDK 17, ~34 min total
### Sorted index path (μs/op, lower is better)
| dictCard | 0.1% (1K docs) | 1% (10K) | 10% (100K) | 50% (500K) | 100% (1M)
|
|----------|----------------|----------|------------|------------|-----------|
| 100 | 1.3 | 1.6 | 1.1 | 1.3 | 1.2 |
| 1K | 7.7 | 13.8 | 10.5 | 9.6 | 9.1 |
| 10K | 45.7 | 81.3 | 101 | 86.0 | 95.7 |
| 100K | 243 | 301 | 888 | 822 | 646 |
### Bitmap inverted index path (μs/op)
| dictCard | 0.1% (1K docs) | 1% (10K) | 10% (100K) | 50% (500K) | 100% (1M)
|
|----------|----------------|----------|------------|------------|-----------|
| 100 | 38 | 19 | 1.6 | 1.0 | 1.0 |
| 1K | 4,256 | 767 | 27 | 16 | 13 |
| 10K | 12,979 | 14,053 | 369 | 179 | 137 |
| 100K | 35,459 | 61,037 | 10,635 | 5,780 | 3,522 |
### Scan path (μs/op)
| dictCard | 0.1% (1K docs) | 1% (10K) | 10% (100K) | 50% (500K) | 100% (1M)
|
|----------|----------------|----------|------------|------------|-----------|
| 100 | 9 | 120 | 2,357 | 11,064 | 20,114 |
| 1K | 17 | 246 | 3,157 | 15,996 | 30,353 |
| 10K | 19 | 342 | 697 | 2,388 | 4,323 |
| 100K | 19 | 609 | 1,518 | 4,220 | 7,680 |
### Cost heuristic crossover validation
| dictCard | configured ratio | heuristic threshold | actual crossover |
|----------|-----------------|--------------------|-----------------|
| 100 | 30x | 3,000 filtered docs | inverted wins at 1%+ (10K docs) ✓ |
| 1K | 30x | 30,000 filtered docs | inverted wins at 10%+ (100K docs) ✓ |
| 10K | 10x | 100,000 filtered docs | inverted wins at 10%+ (100K docs) ✓ |
| 100K | 6x | 600,000 filtered docs | inverted wins at 100% (1M docs) ✓ |
### Key takeaways
- **Sorted index** is the fastest path across all configurations (1–888 μs),
always chosen when a sorted column is available
- **Bitmap inverted index** dominates scan when `filteredDocs >> dictCard`
(e.g., dictCard=100 at 10%: **1.6 μs vs 2,357 μs = 1,473x faster**)
- **Scan** dominates inverted index when dictCard is high relative to
filtered docs (e.g., dictCard=1K at 0.1%: **17 μs vs 4,256 μs**)
- Per-cardinality cost ratios (30x / 10x / 6x) correctly steer the heuristic
at all crossover points
<details>
<summary>Raw JMH output</summary>
```
Benchmark (_dictionaryCardinality)
(_filterSelectivity) (_numDocs) Mode Cnt Score Error Units
BenchmarkInvertedIndexDistinct.invertedIndexPath 100
0.001 1000000 avgt 6 38.231 ± 1.357 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100
0.01 1000000 avgt 6 18.694 ± 4.612 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100
0.1 1000000 avgt 6 1.584 ± 0.168 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100
0.5 1000000 avgt 6 0.983 ± 0.104 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100
1.0 1000000 avgt 6 0.972 ± 0.017 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 1000
0.001 1000000 avgt 6 4256.134 ± 157.568 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 1000
0.01 1000000 avgt 6 766.973 ± 25.905 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 1000
0.1 1000000 avgt 6 26.504 ± 1.263 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 1000
0.5 1000000 avgt 6 15.891 ± 0.426 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 1000
1.0 1000000 avgt 6 12.598 ± 0.295 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 10000
0.001 1000000 avgt 6 12979.261 ± 6223.626 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 10000
0.01 1000000 avgt 6 14053.085 ± 417.580 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 10000
0.1 1000000 avgt 6 369.112 ± 22.802 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 10000
0.5 1000000 avgt 6 179.168 ± 14.620 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 10000
1.0 1000000 avgt 6 137.372 ± 6.651 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100000
0.001 1000000 avgt 6 35458.867 ± 25631.306 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100000
0.01 1000000 avgt 6 61036.517 ± 4329.001 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100000
0.1 1000000 avgt 6 10635.409 ± 501.969 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100000
0.5 1000000 avgt 6 5779.762 ± 394.506 us/op
BenchmarkInvertedIndexDistinct.invertedIndexPath 100000
1.0 1000000 avgt 6 3521.908 ± 160.910 us/op
BenchmarkInvertedIndexDistinct.scanPath 100
0.001 1000000 avgt 6 9.273 ± 0.510 us/op
BenchmarkInvertedIndexDistinct.scanPath 100
0.01 1000000 avgt 6 119.694 ± 19.105 us/op
BenchmarkInvertedIndexDistinct.scanPath 100
0.1 1000000 avgt 6 2356.700 ± 129.508 us/op
BenchmarkInvertedIndexDistinct.scanPath 100
0.5 1000000 avgt 6 11064.090 ± 314.943 us/op
BenchmarkInvertedIndexDistinct.scanPath 100
1.0 1000000 avgt 6 20114.177 ± 520.504 us/op
BenchmarkInvertedIndexDistinct.scanPath 1000
0.001 1000000 avgt 6 17.063 ± 0.130 us/op
BenchmarkInvertedIndexDistinct.scanPath 1000
0.01 1000000 avgt 6 245.697 ± 21.773 us/op
BenchmarkInvertedIndexDistinct.scanPath 1000
0.1 1000000 avgt 6 3156.635 ± 65.038 us/op
BenchmarkInvertedIndexDistinct.scanPath 1000
0.5 1000000 avgt 6 15996.288 ± 1633.042 us/op
BenchmarkInvertedIndexDistinct.scanPath 1000
1.0 1000000 avgt 6 30352.554 ± 535.360 us/op
BenchmarkInvertedIndexDistinct.scanPath 10000
0.001 1000000 avgt 6 19.081 ± 0.530 us/op
BenchmarkInvertedIndexDistinct.scanPath 10000
0.01 1000000 avgt 6 342.450 ± 20.136 us/op
BenchmarkInvertedIndexDistinct.scanPath 10000
0.1 1000000 avgt 6 697.304 ± 12.680 us/op
BenchmarkInvertedIndexDistinct.scanPath 10000
0.5 1000000 avgt 6 2388.036 ± 35.300 us/op
BenchmarkInvertedIndexDistinct.scanPath 10000
1.0 1000000 avgt 6 4323.113 ± 342.050 us/op
BenchmarkInvertedIndexDistinct.scanPath 100000
0.001 1000000 avgt 6 19.445 ± 2.081 us/op
BenchmarkInvertedIndexDistinct.scanPath 100000
0.01 1000000 avgt 6 608.963 ± 51.412 us/op
BenchmarkInvertedIndexDistinct.scanPath 100000
0.1 1000000 avgt 6 1517.888 ± 137.011 us/op
BenchmarkInvertedIndexDistinct.scanPath 100000
0.5 1000000 avgt 6 4219.561 ± 248.307 us/op
BenchmarkInvertedIndexDistinct.scanPath 100000
1.0 1000000 avgt 6 7680.396 ± 136.475 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100
0.001 1000000 avgt 6 1.264 ± 0.144 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100
0.01 1000000 avgt 6 1.629 ± 0.034 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100
0.1 1000000 avgt 6 1.125 ± 0.085 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100
0.5 1000000 avgt 6 1.341 ± 0.092 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100
1.0 1000000 avgt 6 1.161 ± 0.042 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 1000
0.001 1000000 avgt 6 7.746 ± 0.986 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 1000
0.01 1000000 avgt 6 13.783 ± 0.340 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 1000
0.1 1000000 avgt 6 10.462 ± 0.610 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 1000
0.5 1000000 avgt 6 9.629 ± 0.501 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 1000
1.0 1000000 avgt 6 9.134 ± 0.771 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 10000
0.001 1000000 avgt 6 45.663 ± 3.165 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 10000
0.01 1000000 avgt 6 81.316 ± 3.230 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 10000
0.1 1000000 avgt 6 101.034 ± 8.342 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 10000
0.5 1000000 avgt 6 85.989 ± 12.125 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 10000
1.0 1000000 avgt 6 95.657 ± 56.279 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100000
0.001 1000000 avgt 6 242.897 ± 34.804 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100000
0.01 1000000 avgt 6 300.715 ± 54.706 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100000
0.1 1000000 avgt 6 888.095 ± 202.339 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100000
0.5 1000000 avgt 6 821.560 ± 244.944 us/op
BenchmarkInvertedIndexDistinct.sortedIndexPath 100000
1.0 1000000 avgt 6 646.489 ± 24.215 us/op
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]