Re: [PR] add DictionaryEncodedValueIndex.getValueIterator and use it for ExpressionPredicateIndexSupplier (druid)

via GitHub Tue, 17 Feb 2026 11:48:31 -0800


clintropolis commented on PR #19023:
URL: https://github.com/apache/druid/pull/19023#issuecomment-3916756567


   >I wonder if it's worth adding some sort of perf section to the CI. Assuming 
we mandate that all perf-sensitive changes add a corresponding benchmark, 
presumably we can create a unit test that checks for statistically significant 
regressions in most core areas of the code? This might help catch more issues 
before they hit a release. Just would need to limit the execution time of said 
benchmarks.
   
   I think that is a lot easier said than done, we'd have to be very strategic 
about what we run if we are talking about benchmarks like was added here, since 
the benchmarks require quite an absurd amount of time to run them all for all 
combinations.
   
   Just running `SqlExpressionBenchmark` with no specific parameters i see 
   
   ```
   # Run progress: 0.00% complete, ETA 43 days, 09:04:00
   ```
   
   which to be fair is quite a bit of an over estimate since many combinations 
are not run together (like we only test compression stuff on MMAP segments), so 
it wouldn't really be that long, but it would still be a very long time.
   
   If i restrict it to 1 set of combinations (and still run both non-vectorized 
and vectorized), e.g. `-p storageType=MMAP -p schemype=explicit -p 
jsonObjectStorageEncoding=NONE -p complexCompression=lz4 -p stringEncoding=UTF8`
   
   it still estimates almost 11 hours
   
   ```
   # Run progress: 0.00% complete, ETA 10:50:40
   ```
   
   And this is just 1 benchmark. There are a ton of other files, many with just 
as many combinations.
   
   This is also ignoring that these benchmarks are only as good as their 
coverage. The thing this PR is improving basically only applied to front-coding 
with sufficiently large buckets along with queries using enough virtual columns 
in filters so that it became slower than not using the indexes. There were very 
many cases where using the indexes was an improvement, and even this query was 
faster when not using front-coding, so we would need to have the right 
combinations to be able to catch stuff before.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] add DictionaryEncodedValueIndex.getValueIterator and use it for ExpressionPredicateIndexSupplier (druid)

Reply via email to