nikfio opened a new issue, #41049:
URL: https://github.com/apache/arrow/issues/41049
### Describe the enhancement requested
Hello everyone,
I was looking for a way to perform group_by on Table on equally spaced
intervals of data specified by a freq value to apply on the key column passed
as group_by first input described as 'Name of the grouped columns'.
On pandas can be done like this (most compact):
`
df = dataframe.resample('timestamp').interpolate(method='nearest')
`
On polars can be done like this (it would be the most similar option for
pyarrow Table group_by):
`
dataframe.group_by_dynamic(
timestamp,
every=tf).agg(col('value').first().alias('first')
)
`
Does someone have any suggestion?
I think is a needed functionality, Table should have this feature.
To implement this feature, I was thinking about :
1. partitioning the Table in equale parts following a specified `freq` input
value like polars has
2. execute the group_by on each partition
3. concatenate the partitions in one final Table
What do you think about it?
How could I do the point 1?
Thanks,
Nick
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]