[
https://issues.apache.org/jira/browse/BEAM-12495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386397#comment-17386397
]
Beam JIRA Bot commented on BEAM-12495:
--------------------------------------
This issue is assigned but has not received an update in 30 days so it has been
labeled "stale-assigned". If you are still working on the issue, please give an
update and remove the label. If you are no longer working on the issue, please
unassign so someone else may work on it. In 7 days the issue will be
automatically unassigned.
> DataFrame API: groupby(dropna=False) still drops NAs when grouping on
> multiple columns or indexes
> -------------------------------------------------------------------------------------------------
>
> Key: BEAM-12495
> URL: https://issues.apache.org/jira/browse/BEAM-12495
> Project: Beam
> Issue Type: Bug
> Components: dsl-dataframe, sdk-py-core
> Reporter: Brian Hulette
> Assignee: Brian Hulette
> Priority: P2
> Labels: dataframe-api, stale-assigned
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> {code}
> df.groupby(['foo', 'bar'], dropna=False).sum()
> {code}
> This will still drop NAs in the output.
> This is due to pandas bug
> [36470|https://github.com/pandas-dev/pandas/issues/36470] "BUG: groupby(...,
> dropna=False) excludes NA values when grouping on MultiIndex levels".
> We implement groupby by moving all grouped data into the index and requiring
> Index() partitioning, so we will always run into this issue, even when the
> user is grouping on columns, not indexes.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)