[jira] [Commented] (BEAM-11305) df.groupby(df.group) produces duplicate column for some aggregation functons

Beam JIRA Bot (Jira) Thu, 25 Feb 2021 09:19:05 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-11305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291057#comment-17291057
 ]


Beam JIRA Bot commented on BEAM-11305:
--------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it 
has been labeled "stale-P2". If this issue is still affecting you, we care! 
Please comment and remove the label. Otherwise, in 14 days the issue will be 
moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed 
explanation of what these priorities mean.


> df.groupby(df.group) produces duplicate column for some aggregation functons
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-11305
>                 URL: https://issues.apache.org/jira/browse/BEAM-11305
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.25.0
>            Reporter: Brian Hulette
>            Priority: P2
>              Labels: stale-P2
>
> It should be possible to use {{df.groupby(df.group)}} or 
> {{df.groupby('group')}} and get the same result. Unfortunately for some 
> aggregation functions (max, min, all, any), the former produces an output 
> with an extraneous 'group' column. Note this doesn't happen for some 
> functions, like size.
> In groupby, we should check if the the series is one of this dataframe's 
> columns when setting the index: 
> https://github.com/apache/beam/blob/cdb882d9ae554556156bff4843f18567b214df13/sdks/python/apache_beam/dataframe/frames.py#L156



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11305) df.groupby(df.group) produces duplicate column for some aggregation functons

Reply via email to