[ 
https://issues.apache.org/jira/browse/BEAM-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17522897#comment-17522897
 ] 

Beam JIRA Bot commented on BEAM-12169:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Allow non-deferred column operations on categorical columns
> -----------------------------------------------------------
>
>                 Key: BEAM-12169
>                 URL: https://issues.apache.org/jira/browse/BEAM-12169
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-dataframe, sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Andy Ye
>            Priority: P3
>              Labels: dataframe-api, stale-assigned
>          Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> There are several operations that we currently disallow because they produce 
> a variable set of columns in the output based on the data 
> (non-deferred-columns). However, for some dtypes (categorical, boolean) we 
> can easily enumerate all the possible values that will be seen at execution 
> time, so we can predict the columns that will be seen.
> Note we still can't implement these operations 100% correctly, as pandas will 
> typically only create columns for the values that are {_}observed{_}, while 
> we'd have to create a column for every possible value.
> We should allow these operations in these special cases.
> Operations in this category:
>  - DataFrame.unstack, Series.unstack (can work if unstacked level is a 
> categorical or boolean column)
>  - Series.str.get_dummies
>  - Series.str.split
>  - Series.str.rsplit
>  - DataFrame.pivot
>  - DataFrame.pivot_table



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to