[ 
https://issues.apache.org/jira/browse/BEAM-11303?focusedWorklogId=513824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-513824
 ]

ASF GitHub Bot logged work on BEAM-11303:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Nov/20 22:33
            Start Date: 18/Nov/20 22:33
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit merged pull request #13379:
URL: https://github.com/apache/beam/pull/13379


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 513824)
    Time Spent: 20m  (was: 10m)

> DataFrame GroupBy().size() aggregation produces incorrect results
> -----------------------------------------------------------------
>
>                 Key: BEAM-11303
>                 URL: https://issues.apache.org/jira/browse/BEAM-11303
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.25.0
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P2
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> size is treated as a liftable aggregation which assumes it is commutative and 
> associative, but it's not actually associative. It can be lifted, but the 
> post agg step needs to be a sum.
> This means the size aggregation will produce incorrect results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to