Xiangrui Meng created SPARK-7921:
------------------------------------

             Summary: Change includeFirst to dropLast in OneHotEncoder
                 Key: SPARK-7921
                 URL: https://issues.apache.org/jira/browse/SPARK-7921
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 1.4.0
            Reporter: Xiangrui Meng
            Assignee: Xiangrui Meng


Change includeFirst to dropLast and leave the default to true. There are couple 
benefits:

a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., 
http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
b. keep the indices unmodified in the output vector. If we drop the first, all 
indices will be shifted by 1.
c. If users use StringIndex, the last element is the least frequent one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to