TheNeuralBit commented on a change in pull request #16615:
URL: https://github.com/apache/beam/pull/16615#discussion_r803100247
##########
File path: sdks/python/apache_beam/dataframe/frames.py
##########
@@ -4625,9 +4625,48 @@ def repeat(self, repeats):
raise TypeError("str.repeat(repeats=) value must be an int or a "
f"DeferredSeries (encountered {type(repeats)}).")
- get_dummies = frame_base.wont_implement_method(
- pd.core.strings.StringMethods, 'get_dummies',
- reason='non-deferred-columns')
+ @frame_base.with_docs_from(pd.core.strings.StringMethods)
+ @frame_base.args_to_kwargs(pd.core.strings.StringMethods)
+ def get_dummies(self, **kwargs):
+ """
+ Series must be categorical type. Either cast to ``category`` to
+ infer categories, or preferred, cast to ``CategoricalDtype``
+ to ensure correct categories.
Review comment:
Yes but that example only works if you have a concrete pandas `Series`,
our users will be working with Beam `DeferredSeries` objects, which don't
support inferring a `CategoricalDtype` with `astype('category')`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]