rtpsw commented on PR #14352: URL: https://github.com/apache/arrow/pull/14352#issuecomment-1279817920
Notes about the recent commit: - The PyArrow `GroupBy` API currently does not expose the `segment_keys` parameter. I think this should be done in a separate task. - The C++ `GroupBy` API currently does not support streaming. While the `Grouper` class it relies on has been refactored to make it easy to add streaming support, I think this is not a priority for the current PR because the important API is the aggregate node. - The aggregate node API supports streaming. However, the existing tests only cover the case of an empty `segment_keys` parameter. These tests do cover the aggregate node's code in the PR but not that of all `GroupingSegment` implementations. I intend to add more tests here. - The `SegmentKeyWithChunkedArray` test only covers chunked arrays and a single-key segmenter. I intend to add tests for other datum kinds and segmenter types. I would be good to get feedback on the approach for this PR even before I add further tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
