[
https://issues.apache.org/jira/browse/BEAM-11777?focusedWorklogId=594728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-594728
]
ASF GitHub Bot logged work on BEAM-11777:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/May/21 19:14
Start Date: 11/May/21 19:14
Worklog Time Spent: 10m
Work Description: TheNeuralBit commented on a change in pull request
#14438:
URL: https://github.com/apache/beam/pull/14438#discussion_r630453070
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -2321,18 +2410,23 @@ def __getitem__(self, name):
self._grouping_indexes,
projection=name)
- def agg(self, fn):
- if not callable(fn):
- # TODO: Add support for strings in (UN)LIFTABLE_AGGREGATIONS. Test by
- # running doctests for pandas.core.groupby.generic
- raise NotImplementedError('GroupBy.agg currently only supports callable '
- 'arguments')
- return DeferredDataFrame(
- expressions.ComputedExpression(
- 'agg',
- lambda gb: gb.agg(fn), [self._expr],
- requires_partition_by=partitionings.Index(),
- preserves_partition_by=partitionings.Singleton()))
+ def agg(self, fn, *args, **kwargs):
+ if callable(fn):
Review comment:
Good catch, I moved this to the end. It also exposed an edge case in
preagg proxy generation. I pushed ddd0885 to address this./
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -2321,18 +2410,23 @@ def __getitem__(self, name):
self._grouping_indexes,
projection=name)
- def agg(self, fn):
- if not callable(fn):
- # TODO: Add support for strings in (UN)LIFTABLE_AGGREGATIONS. Test by
- # running doctests for pandas.core.groupby.generic
- raise NotImplementedError('GroupBy.agg currently only supports callable '
- 'arguments')
- return DeferredDataFrame(
- expressions.ComputedExpression(
- 'agg',
- lambda gb: gb.agg(fn), [self._expr],
- requires_partition_by=partitionings.Index(),
- preserves_partition_by=partitionings.Singleton()))
+ def agg(self, fn, *args, **kwargs):
+ if callable(fn):
Review comment:
Good catch, I moved this to the end. It also exposed an edge case in
preagg proxy generation. I pushed ddd0885 to address this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 594728)
Time Spent: 15h (was: 14h 50m)
> Support correct kwargs in aggregation methods on DataFrame, Series
> ------------------------------------------------------------------
>
> Key: BEAM-11777
> URL: https://issues.apache.org/jira/browse/BEAM-11777
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Brian Hulette
> Priority: P2
> Labels: dataframe-api
> Time Spent: 15h
> Remaining Estimate: 0h
>
> {DataFrame,Series}.{all, any, max, min, prod, mean, median, sum} are all
> implemented via frame_base._agg_method, which just re-uses
> {DataFrame,Series}.agg}. However the pandas operations have some different
> kwargs that are not supported by agg. Some are universal (level=, skip_na=),
> others are unique to each operation (numeric_only= or bool_only=).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)