clintropolis commented on PR #19023: URL: https://github.com/apache/druid/pull/19023#issuecomment-3916756567
>I wonder if it's worth adding some sort of perf section to the CI. Assuming we mandate that all perf-sensitive changes add a corresponding benchmark, presumably we can create a unit test that checks for statistically significant regressions in most core areas of the code? This might help catch more issues before they hit a release. Just would need to limit the execution time of said benchmarks. I think that is a lot easier said than done, we'd have to be very strategic about what we run if we are talking about benchmarks like was added here, since the benchmarks require quite an absurd amount of time to run them all for all combinations. Just running `SqlExpressionBenchmark` with no specific parameters i see ``` # Run progress: 0.00% complete, ETA 43 days, 09:04:00 ``` which to be fair is quite a bit of an over estimate since many combinations are not run together (like we only test compression stuff on MMAP segments), so it wouldn't really be that long, but it would still be a very long time. If i restrict it to 1 set of combinations (and still run both non-vectorized and vectorized), e.g. `-p storageType=MMAP -p schemype=explicit -p jsonObjectStorageEncoding=NONE -p complexCompression=lz4 -p stringEncoding=UTF8` it still estimates almost 11 hours ``` # Run progress: 0.00% complete, ETA 10:50:40 ``` And this is just 1 benchmark. There are a ton of other files, many with just as many combinations. This is also ignoring that these benchmarks are only as good as their coverage. The thing this PR is improving basically only applied to front-coding with sufficiently large buckets along with queries using enough virtual columns in filters so that it became slower than not using the indexes. There were very many cases where using the indexes was an improvement, and even this query was faster when not using front-coding, so we would need to have the right combinations to be able to catch stuff before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
