Andrey Yatsuk created SPARK-19947:
-------------------------------------
Summary: RFormulaModel always throws Exception on transforming
data with NULL or Unseen labels
Key: SPARK-19947
URL: https://issues.apache.org/jira/browse/SPARK-19947
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.1.0
Reporter: Andrey Yatsuk
I have trained ML model and big data table in parquet. I want add new column to
this table with predicted values. I can't lose any data, but can having null
values in it.
RFormulaModel.fit() method creates new StringIndexer with default
(handleInvalid="error") parameter. Also VectorAssembler on NULL values throwing
Exception. So I must call df.na.drop() to transform this DataFrame and I don't
want to do this.
Need add to RFormula new parameter like handleInvalid in StringIndexer.
Or add transform(Seq<Column>): Vector method which user can use as UDF method
in df.withColumn("predicted", functions.callUDF(rFormulaModel::transform,
Seq<Column>))
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]