[GitHub] [arrow] amol- commented on pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

GitBox Tue, 16 Nov 2021 06:13:17 -0800


amol- commented on pull request #11624:
URL: https://github.com/apache/arrow/pull/11624#issuecomment-970313614



   > pyarrow would add yet another slightly different interface.
   > (but I also agree that groupby is not a great name as method on the table 
for this reason)
   > 
   
   I don't have a strong opinion about the single step or multi step API. I 
personally rarely ever had the need to do a grouping without an associated 
aggregation, so I feel that the value of the multistep approach isn't huge, 
even thought it might be easier to evolve in the future.
   
   > Playing a bit with this branch, some other observations:
   > 
   > * I find it unexpected that the resulting table always has "key" column 
instead of reusing the original name that was specified as the key column
   > * Is it possible to group by multiple columns? Not in the current bindings 
in this PR, but I suppose in c++ / R this is already possible?
   > * I think users will very quickly request the ability to specify the 
resulting column name .. (to not have things like "column_count_distinct")
   
   I implemented support for the first two points in 
https://github.com/apache/arrow/pull/11624/commits/dfecba12901e6ff13181886b052164f734170d67
   Regarding the third one, I wonder if that would be best satisfied by 
extending the `Table.rename_columns` API to support a mapping of column names
   IE:
   ```
   t.rename_column({"oldcolname": "newcolname"})
   ```
   that might be convenient for other use cases too (for example when willing 
to rename only a subset of columns) and would expose the ability to do
   ```
   t.group_by("keycol", ["value1"], ["sum"]).rename_column({"value1_sum": 
"total"})
   ```
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- commented on pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

Reply via email to