TheNeuralBit commented on code in PR #21803:
URL: https://github.com/apache/beam/pull/21803#discussion_r894921012
##########
sdks/python/apache_beam/ml/inference/sklearn_inference.py:
##########
@@ -94,9 +91,6 @@ class SklearnModelHandlerPandas(ModelHandler[pandas.DataFrame,
BaseEstimator]):
""" Implementation of the ModelHandler interface for scikit-learn that
supports pandas dataframes.
-
- NOTE: This API and its implementation are under development and
- do not provide backward compatibility guarantees.
Review Comment:
To be more specific on why this might change - now that the batching DoFn
infrastructure is in, I'd like to make the pandas sklearn implementation to
leverage it. We'd move to a model where the element type is a Beam Row (with
schema), and the batch type is a pandas DataFrame. As opposed to the current
model where the batch type is a list of single element dataframes.
Once we do that we could pass data from the DataFrame API (under the hood a
`PCollection[pd.DataFrame]`) directly to RunInference, without having to
unbatch it and then batch it back up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]