Brian Hulette created BEAM-12495:
------------------------------------
Summary: DataFrame API: groupby(dropna=False) does not work when
grouping on multiple columns or indexes
Key: BEAM-12495
URL: https://issues.apache.org/jira/browse/BEAM-12495
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Reporter: Brian Hulette
{code}
df.groupby(['foo', 'bar'], dropna=False).sum()
{code}
This will still drop NAs in the output.
This is due to pandas bug
[36470|https://github.com/pandas-dev/pandas/issues/36470] "BUG: groupby(...,
dropna=False) excludes NA values when grouping on MultiIndex levels".
We implement groupby by moving all grouped data into the index and requiring
Index() partitioning, so we will always run into this issue, even when the user
is grouping on columns, not indexes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)