This is an automated email from the ASF dual-hosted git repository.
jorisvandenbossche pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 24ba1cf75c GH-34579: [Python][Docs] TableGroupBy.aggregate options
(#34759)
24ba1cf75c is described below
commit 24ba1cf75c01040b279514ecc063975de272b766
Author: Alenka Frim <[email protected]>
AuthorDate: Wed Mar 29 16:47:33 2023 +0200
GH-34579: [Python][Docs] TableGroupBy.aggregate options (#34759)
### Rationale for this change
Add more information and examples to `pa.TableGroupBy.aggregate` method to
make it clearer to use.
### What changes are included in this PR?
Changes in the `pa.TableGroupBy.aggregate` docstrings include:
- link to
https://arrow.apache.org/docs/python/compute.html#grouped-aggregations
- extra examples
* Closes: #34579
Lead-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
---
python/pyarrow/table.pxi | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/python/pyarrow/table.pxi b/python/pyarrow/table.pxi
index 9da7110a1d..278025a04e 100644
--- a/python/pyarrow/table.pxi
+++ b/python/pyarrow/table.pxi
@@ -5514,6 +5514,9 @@ list[tuple(str, str, FunctionOptions)]
column names, for unary, nullary and n-ary aggregation functions
respectively.
+ For the list of function names and respective aggregation
+ function options see :ref:`py-grouped-aggrs`.
+
Returns
-------
Table
@@ -5526,6 +5529,9 @@ list[tuple(str, str, FunctionOptions)]
... pa.array(["a", "a", "b", "b", "c"]),
... pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])
+
+ Sum the column "values" over the grouped column "keys":
+
>>> t.group_by("keys").aggregate([("values", "sum")])
pyarrow.Table
values_sum: int64
@@ -5533,6 +5539,9 @@ list[tuple(str, str, FunctionOptions)]
----
values_sum: [[3,7,5]]
keys: [["a","b","c"]]
+
+ Count the rows over the grouped column "keys":
+
>>> t.group_by("keys").aggregate([([], "count_all")])
pyarrow.Table
count_all: int64
@@ -5540,6 +5549,38 @@ list[tuple(str, str, FunctionOptions)]
----
count_all: [[2,2,1]]
keys: [["a","b","c"]]
+
+ Do multiple aggregations:
+
+ >>> t.group_by("keys").aggregate([
+ ... ("values", "sum"),
+ ... ("keys", "count")
+ ... ])
+ pyarrow.Table
+ values_sum: int64
+ keys_count: int64
+ keys: string
+ ----
+ values_sum: [[3,7,5]]
+ keys_count: [[2,2,1]]
+ keys: [["a","b","c"]]
+
+ Count the number of non-null values for column "values"
+ over the grouped column "keys":
+
+ >>> import pyarrow.compute as pc
+ >>> t.group_by(["keys"]).aggregate([
+ ... ("values", "count", pc.CountOptions(mode="only_valid"))
+ ... ])
+ pyarrow.Table
+ values_count: int64
+ keys: string
+ ----
+ values_count: [[2,2,1]]
+ keys: [["a","b","c"]]
+
+ Get a single row for each group in column "keys":
+
>>> t.group_by("keys").aggregate([])
pyarrow.Table
keys: string