Bago Amirbekian created SPARK-21926:
---------------------------------------

             Summary: Some transformers in spark.ml.feature fail when trying to 
transform steaming dataframes
                 Key: SPARK-21926
                 URL: https://issues.apache.org/jira/browse/SPARK-21926
             Project: Spark
          Issue Type: Bug
          Components: ML, Structured Streaming
    Affects Versions: 2.2.0
            Reporter: Bago Amirbekian


We've run into a few cases where ML components don't play nice with streaming 
dataframes (for prediction). This ticket is meant to help aggregate these known 
cases in one place and provide a place to discuss possible fixes.

Failing cases:
1) VectorAssembler where one of the inputs is a VectorUDT column with no 
metadata.
Possible fixes:
a) Re-design vectorUDT metadata to support missing metadata for some elements. 
(This might be a good thing to do anyways SPARK-19141)
b) drop metadata in streaming context.

2) OneHotEncoder where the input is a column with no metadata.
Possible fixes:
a) Make OneHotEncoder an estimator (SPARK-13030).
b) Allow user to set the cardinality of OneHotEncoder.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to