zanmato1984 commented on PR #41234: URL: https://github.com/apache/arrow/pull/41234#issuecomment-2121734448
Try to summarize: 1. The current `Grouper` (before this PR) uses the "sorted" approach, has no bug, and performs 15% faster than the "non-sorted" approach (introduced in this PR) for "sorted"-friendly scenarios. 2. The current `Grouper` (before this PR) doesn't support the "non-sorted" approach (or in other words, has bug if force enabled). 3. This PR enables the "non-sorted" approach (or in other words, fixes the bug in 2), as an user faced option, and performs 2%~4% faster than the original "sorted" approach for "non-sorted"-friendly scenarios. IIUC, then I think it might be more proper to treat this PR as an enhancement or even a small feature than a bug fix, because IMO it's not actually a bug if `Grouper` is not designed to use "non-sorted" approach from the beginning. And the question comes to: do we need this new "non-sorted" approach for `Grouper` with the given performance number? Personally I think we can have it, but I'm not confident (nor authorized) to make the call. So I'll just leave it to others :) cc @pitrou @felipecrv @westonpace -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
