[
https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bago Amirbekian updated SPARK-21926:
------------------------------------
Description:
We've run into a few cases where ML components don't play nice with streaming
dataframes (for prediction). This ticket is meant to help aggregate these known
cases in one place and provide a place to discuss possible fixes.
Failing cases:
1) VectorAssembler where one of the inputs is a VectorUDT column with no
metadata.
Possible fixes:
More details here Spark-22346.
2) OneHotEncoder where the input is a column with no metadata.
Possible fixes:
a) Make OneHotEncoder an estimator (SPARK-13030).
b) Allow user to set the cardinality of OneHotEncoder.
was:
We've run into a few cases where ML components don't play nice with streaming
dataframes (for prediction). This ticket is meant to help aggregate these known
cases in one place and provide a place to discuss possible fixes.
Failing cases:
1) VectorAssembler where one of the inputs is a VectorUDT column with no
metadata.
Possible fixes:
I've created a jira to track this
2) OneHotEncoder where the input is a column with no metadata.
Possible fixes:
a) Make OneHotEncoder an estimator (SPARK-13030).
b) Allow user to set the cardinality of OneHotEncoder.
> Compatibility between ML Transformers and Structured Streaming
> --------------------------------------------------------------
>
> Key: SPARK-21926
> URL: https://issues.apache.org/jira/browse/SPARK-21926
> Project: Spark
> Issue Type: Umbrella
> Components: ML, Structured Streaming
> Affects Versions: 2.2.0
> Reporter: Bago Amirbekian
>
> We've run into a few cases where ML components don't play nice with streaming
> dataframes (for prediction). This ticket is meant to help aggregate these
> known cases in one place and provide a place to discuss possible fixes.
> Failing cases:
> 1) VectorAssembler where one of the inputs is a VectorUDT column with no
> metadata.
> Possible fixes:
> More details here Spark-22346.
> 2) OneHotEncoder where the input is a column with no metadata.
> Possible fixes:
> a) Make OneHotEncoder an estimator (SPARK-13030).
> b) Allow user to set the cardinality of OneHotEncoder.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]