damccorm opened a new issue, #20958:
URL: https://github.com/apache/beam/issues/20958
There are several operations that we currently disallow because they produce
a variable set of columns in the output based on the data
(non-deferred-columns). However, for some dtypes (categorical, boolean) we can
easily enumerate all the possible values that will be seen at execution time,
so we can predict the columns that will be seen.
Note we still can't implement these operations 100% correctly, as pandas
will typically only create columns for the values that are __observed__, while
we'd have to create a column for every possible value.
We should allow these operations in these special cases.
Operations in this category:
- DataFrame.unstack, Series.unstack (can work if unstacked level is a
categorical or boolean column)
- Series.str.get_dummies
- Series.str.split
- Series.str.rsplit
- DataFrame.pivot
- DataFrame.pivot_table
Imported from Jira
[BEAM-12169](https://issues.apache.org/jira/browse/BEAM-12169). Original Jira
may contain additional context.
Reported by: bhulette.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]