[
https://issues.apache.org/jira/browse/BEAM-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299734#comment-17299734
]
Beam JIRA Bot commented on BEAM-11305:
--------------------------------------
This issue was marked "stale-P2" and has not received a public comment in 14
days. It is now automatically moved to P3. If you are still affected by it, you
can comment and move it back to P2.
> df.groupby(df.group) produces duplicate column for some aggregation functons
> ----------------------------------------------------------------------------
>
> Key: BEAM-11305
> URL: https://issues.apache.org/jira/browse/BEAM-11305
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.25.0
> Reporter: Brian Hulette
> Priority: P3
>
> It should be possible to use {{df.groupby(df.group)}} or
> {{df.groupby('group')}} and get the same result. Unfortunately for some
> aggregation functions (max, min, all, any), the former produces an output
> with an extraneous 'group' column. Note this doesn't happen for some
> functions, like size.
> In groupby, we should check if the the series is one of this dataframe's
> columns when setting the index:
> https://github.com/apache/beam/blob/cdb882d9ae554556156bff4843f18567b214df13/sdks/python/apache_beam/dataframe/frames.py#L156
--
This message was sent by Atlassian Jira
(v8.3.4#803005)