clintropolis opened a new pull request #10743: URL: https://github.com/apache/druid/pull/10743
### Description This PR fixes an issue where queries with subtotals and limits can have incorrect results due to the way which `LimitedBufferHashGrouper` operates, using a min-max heap which is destroyed upon iteration, while subtotals require multiple iterations on the same grouper with different dimension sets. To fix this, we can exploit that the min-max heap is stored as a contiguous array, and the continual removal of the min element (which is the first element of the backing array) and subsequent rebalance of the heap that follows will leave the a newly freed space at the end of the array, allowing us to write the sorted array of offsets in reverse. To illustrate with an example, here is how the min-max heap normally behaves (with the data in the backing buffer included) ``` adding 1 [1] backing-array:[1] adding 2 [1, 2] backing-array:[1, 2] adding 3 [1, 2, 3] backing-array:[1, 2, 3] adding 5 [1, 5, 3, 2] backing-array:[1, 5, 3, 2] adding 8 [1, 8, 3, 2, 5] backing-array:[1, 8, 3, 2, 5] removed min: 1 [2, 8, 3, 5] backing-array:[2, 8, 3, 5, 5] removed min: 2 [3, 8, 5] backing-array:[3, 8, 5, 5, 5] removed min: 3 [5, 8] backing-array:[5, 8, 5, 5, 5] removed min: 5 [8] backing-array:[8, 8, 5, 5, 5] removed min: 8 [] backing-array:[8, 8, 5, 5, 5] ``` Here is the same set of operations, but with back-filling the sorted array as we remove the heap: ``` adding 1 [1] backing-array:[1] adding 2 [1, 2] backing-array:[1, 2] adding 3 [1, 2, 3] backing-array:[1, 2, 3] adding 5 [1, 5, 3, 2] backing-array:[1, 5, 3, 2] adding 8 [1, 8, 3, 2, 5] backing-array:[1, 8, 3, 2, 5] removed min: 1 set position: 4 [2, 8, 3, 5] backing-array:[2, 8, 3, 5, 1] removed min: 2 set position: 3 [3, 8, 5] backing-array:[3, 8, 5, 2, 1] removed min: 3 set position: 2 [5, 8] backing-array:[5, 8, 3, 2, 1] removed min: 5 set position: 1 [8] backing-array:[8, 5, 3, 2, 1] removed min: 8 set position: 0 [] backing-array:[8, 5, 3, 2, 1] ``` I don't believe that new offsets will be added to this heap _after_ it has been iterated, but could not definitively prove it, so I took a perhaps aggressive approach and decided that this should be an exception state. The previous behavior was likely broken, since iterating the heap is destructive, adding a new offset and then iterating again would result in an iterator which only produced the new offset. I did not want to preserve this behavior, but it would be possible if it is correct for some reason (no other grouper behaves like this afaict). If adding new offsets does legitimately happen after iteration, when adding a new offset after the grouper has been iterated we could re-heapify from the sorted array, prior to adding the new offset to the rebuilt heap. I didn't want to add the extra code for this unless strictly necessary, so an exception will rather noisily let us know that it happens and we need to make this change, but it should be easy enough to do if necessary. <hr> This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. <hr> ##### Key changed/added classes in this PR * `LimitedBufferHashGrouper` * `ByteBufferMinMaxOffsetHeap` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org