Bago Amirbekian created SPARK-21926: ---------------------------------------
Summary: Some transformers in spark.ml.feature fail when trying to transform steaming dataframes Key: SPARK-21926 URL: https://issues.apache.org/jira/browse/SPARK-21926 Project: Spark Issue Type: Bug Components: ML, Structured Streaming Affects Versions: 2.2.0 Reporter: Bago Amirbekian We've run into a few cases where ML components don't play nice with streaming dataframes (for prediction). This ticket is meant to help aggregate these known cases in one place and provide a place to discuss possible fixes. Failing cases: 1) VectorAssembler where one of the inputs is a VectorUDT column with no metadata. Possible fixes: a) Re-design vectorUDT metadata to support missing metadata for some elements. (This might be a good thing to do anyways SPARK-19141) b) drop metadata in streaming context. 2) OneHotEncoder where the input is a column with no metadata. Possible fixes: a) Make OneHotEncoder an estimator (SPARK-13030). b) Allow user to set the cardinality of OneHotEncoder. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org