Yanbo Liang created SPARK-14659:
-----------------------------------

             Summary: OneHotEncoder support drop first category alphabetically 
in the encoded vector 
                 Key: SPARK-14659
                 URL: https://issues.apache.org/jira/browse/SPARK-14659
             Project: Spark
          Issue Type: Improvement
          Components: ML
            Reporter: Yanbo Liang


R formula drop the first category alphabetically when encode string/category 
feature. Spark RFormula use OneHotEncoder to encode string/category feature 
into vector, but only supporting "dropLast" by string/category frequencies. 
This will cause SparkR produce different models compared with native R.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to