Yanbo Liang created SPARK-14659:
-----------------------------------
Summary: OneHotEncoder support drop first category alphabetically
in the encoded vector
Key: SPARK-14659
URL: https://issues.apache.org/jira/browse/SPARK-14659
Project: Spark
Issue Type: Improvement
Components: ML
Reporter: Yanbo Liang
R formula drop the first category alphabetically when encode string/category
feature. Spark RFormula use OneHotEncoder to encode string/category feature
into vector, but only supporting "dropLast" by string/category frequencies.
This will cause SparkR produce different models compared with native R.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]