gortiz commented on PR #8766: URL: https://github.com/apache/pinot/pull/8766#issuecomment-1137183022
I've added two new commits, creating what I called V2 and V3. I ran the benchmark in both phases. Taking into consideration what @richardstartin said about IdentityHashMap, I ran these tests twice: One using the IdentityHashMap and another one using a normal HashMap. The first commit, V2, tries to apply what @Jackie-Jiang suggested about not only cache the hashes but also the converted values and instead of doing that lazily, just eagerly do that while the expression is visited during the prepare phase. This implies that the key used in the cache is a `Predicate`, not the `literals`. I also had to break the safe compatibility, because the value of the EQ operators is a single `CachedValue` but the value of the IN operator is a list of them. I could create two maps instead, but I didn't like that solution either. Which one would you prefer? Do you have another solution in mind? The second commit, V3, has a bigger impact (it cuts the cost by a half!). The idea here is to add a cache of DataSource in ImmutableSegment. It is clear in the flamegraphs created from the benchmark that a lot of time was spent trying to calculate this value. It makes a lot of sense, as there are a lot of segments and we need to create a new instance for each one. In fact the instance creation is quite expensive not because it has to generate memory, but due to the fact that the columns used as keys are different strings that has the same content. This force HashMap to calculate compare them by equals, which is relative expensive when the operation itself is in the order of microseconds. In the benchmark each segment is read tons of times, so having the DataSource cached on all executions but the first is a huge advantage. In a Pinot Server that condition should also be true and the improvement should applicable to other methods that heavily use `getDataSource()`. As I said in another comment, I'm not sure about the correctness of this change. It seems correct but I would like the confirmation from someone else that knows the codebase. The results of the benchmark are the following: Before any change: ``` Benchmark (_numRows) (_numSegments) Mode Cnt Score Error Units BenchmarkServerSegmentPruner.query 10 10 avgt 5 1.426 ± 0.003 us/op BenchmarkServerSegmentPruner.query 10 100 avgt 5 14.719 ± 1.342 us/op BenchmarkServerSegmentPruner.query 10 1000 avgt 5 174.879 ± 5.115 us/op ``` V1 (yesterday): ``` Benchmark (_numRows) (_numSegments) Mode Cnt Score Error Units BenchmarkServerSegmentPruner.query 10 10 avgt 5 0.972 ± 0.003 us/op BenchmarkServerSegmentPruner.query 10 100 avgt 5 10.149 ± 0.046 us/op BenchmarkServerSegmentPruner.query 10 1000 avgt 5 135.063 ± 14.670 us/op ``` V2 ``` V2 with HashMap Benchmark (_numRows) (_numSegments) Mode Cnt Score Error Units BenchmarkColumnValueSegmentPruner.query 10 10 avgt 5 1.325 ± 0.029 us/op BenchmarkColumnValueSegmentPruner.query 10 100 avgt 5 12.851 ± 0.171 us/op BenchmarkColumnValueSegmentPruner.query 10 1000 avgt 5 138.314 ± 2.238 us/op V2 with IdentityHashMap Benchmark (_numRows) (_numSegments) Mode Cnt Score Error Units BenchmarkColumnValueSegmentPruner.query 10 10 avgt 5 1.110 ± 0.001 us/op BenchmarkColumnValueSegmentPruner.query 10 100 avgt 5 10.659 ± 0.139 us/op BenchmarkColumnValueSegmentPruner.query 10 1000 avgt 5 120.608 ± 0.159 us/op ``` V3 ``` V3 with HashMap Benchmark (_numRows) (_numSegments) Mode Cnt Score Error Units BenchmarkColumnValueSegmentPruner.query 10 10 avgt 5 0.971 ± 0.003 us/op BenchmarkColumnValueSegmentPruner.query 10 100 avgt 5 4.843 ± 0.045 us/op BenchmarkColumnValueSegmentPruner.query 10 1000 avgt 5 72.658 ± 0.735 us/op V3 with IdentityHashMap Benchmark (_numRows) (_numSegments) Mode Cnt Score Error Units BenchmarkColumnValueSegmentPruner.query 10 10 avgt 5 0.556 ± 0.001 us/op BenchmarkColumnValueSegmentPruner.query 10 100 avgt 5 4.964 ± 0.268 us/op BenchmarkColumnValueSegmentPruner.query 10 1000 avgt 5 58.304 ± 0.382 us/op ``` I cannot print them all in a single JMH execution because each one requires to do some modifications in the code. I will try to edit this message to create a table to make the difference clear, but the TL;DR: is that by merging these changes we should get an almost 57% improvement when pruning 1000 segments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
