rtpsw commented on PR #14352:
URL: https://github.com/apache/arrow/pull/14352#issuecomment-1279817920

   Notes about the recent commit:
   - The PyArrow `GroupBy` API currently does not expose the `segment_keys` 
parameter. I think this should be done in a separate task.
   - The C++ `GroupBy` API currently does not support streaming. While the 
`Grouper` class it relies on has been refactored to make it easy to add 
streaming support, I think this is not a priority for the current PR because 
the important API is the aggregate node.
   - The aggregate node API supports streaming. However, the existing tests 
only cover the case of an empty `segment_keys` parameter. These tests do cover 
the aggregate node's code in the PR but not that of all `GroupingSegment` 
implementations. I intend to add more tests here.
   - The `SegmentKeyWithChunkedArray` test only covers chunked arrays and a 
single-key segmenter. I intend to add tests for other datum kinds and segmenter 
types.
   
   I would be good to get feedback on the approach for this PR even before I 
add further tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to