[I] How to perform group_by on a Table on equally spaced interval specified by a freq input [arrow]

via GitHub Sat, 06 Apr 2024 01:12:43 -0700


nikfio opened a new issue, #41049:
URL: https://github.com/apache/arrow/issues/41049


   ### Describe the enhancement requested
   
   Hello everyone,
   
   
   I was looking for a way to perform group_by on Table on equally spaced 
intervals of data specified by a freq value to apply on the key column passed 
as group_by first input described as 'Name of the grouped columns'.
   
   On pandas can be done like this (most compact):
   
   `
   df = dataframe.resample('timestamp').interpolate(method='nearest')
   `
   
   On polars can be done like this (it would be the most similar option for 
pyarrow Table group_by):
   
   `
   dataframe.group_by_dynamic(
                       timestamp,
                       every=tf).agg(col('value').first().alias('first')
   )
   `
   
   
   Does someone have any suggestion?
   I think is a needed functionality, Table should have this feature.
   To implement this feature, I was thinking about :
   
   1. partitioning the Table in equale parts following a specified `freq` input 
value like polars has
   2. execute the group_by on each partition
   3. concatenate the partitions in one final Table
   
   What do you think about it?
   How could I do the point 1? 
   
   
   Thanks,
   Nick 
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] How to perform group_by on a Table on equally spaced interval specified by a freq input [arrow]

Reply via email to