rluvaton commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2818864086
> Thank you @rluvaton. I had some difficulty to understand what does this PR actually solve. If you can share a real case to demonstrate how this order in metadata works in a real use case, it would greatly help in understanding the need for this change. AFAICS this path is only executed during spill scenarios at the moment. This expose to GroupAccumulator whether the group indices are sorted or not. This allow group accumulator to have specific optimization based on that for example only saving the current group state An optimization that can be made when the group indices are sorted is for example if you implement count distinct. If you know that once you no longer have a specific group you can clean the internal hash set that was used to track unique values in that group. > How does spilling disrupt the order, and how does this fix restore it? Are there any other use cases for this feature as well? Spilling does not disrupt the order, actually when there are spill we go to merge phase and sort all the spill files into 1 sorted stream so we can now take advantage of that by adapting our implementation And this is not a fix but rather propagating some knowledge that the operator has -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org