ianmcook commented on code in PR #12460:
URL: https://github.com/apache/arrow/pull/12460#discussion_r846110288
##########
docs/source/python/api/compute.rst:
##########
@@ -45,6 +45,21 @@ Aggregations
tdigest
variance
+Cumulative Functions
+--------------------
+
+Cumulative functions are vector functions that perform a running total on its
+input and outputs an array containing the corresponding intermediate running
values.
Review Comment:
>Is this what's expected
Yes I think so.
I am worried that users will will _think_ this function does something like
this:
```python
>>> import pyarrow as pa
>>> import pyarrow.compute as pc
>>> t = pa.table({'x':[1, 2, 3, 4]})
>>> pc.cumulative_sum(t, ['x'])
pyarrow.Table
x: int64
----
x: [[1,3,6,10]]
```
That's what `pandas.DataFrame.cumsum` does, so users of PyArrow will expect
it's what `pyarrow.compute.cumulative_sum` does. But it's not.
This is less of an obvious problem in PyArrow, but users of APIs that create
ExecPlans might think it works this way.
(P.S. There is currently no way for Arrow C++ compute functions to do what
my example here shows, because we can't deterministically preserve row order.
Later if we implement window functions, we will get this capability.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]