Brian Hulette created BEAM-12495:
------------------------------------

             Summary: DataFrame API: groupby(dropna=False) does not work when 
grouping on multiple columns or indexes
                 Key: BEAM-12495
                 URL: https://issues.apache.org/jira/browse/BEAM-12495
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Brian Hulette


{code}
df.groupby(['foo', 'bar'], dropna=False).sum()
{code}

This will still drop NAs in the output.

This is due to pandas bug 
[36470|https://github.com/pandas-dev/pandas/issues/36470] "BUG: groupby(..., 
dropna=False) excludes NA values when grouping on MultiIndex levels".

We implement groupby by moving all grouped data into the index and requiring 
Index() partitioning, so we will always run into this issue, even when the user 
is grouping on columns, not indexes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to