[ https://issues.apache.org/jira/browse/SPARK-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangrui Meng closed SPARK-7921. -------------------------------- Resolution: Fixed Fix Version/s: 1.4.0 > Change includeFirst to dropLast in OneHotEncoder > ------------------------------------------------ > > Key: SPARK-7921 > URL: https://issues.apache.org/jira/browse/SPARK-7921 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 1.4.0 > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > Fix For: 1.4.0 > > > Change includeFirst to dropLast and leave the default to true. There are > couple benefits: > a. consistent with other tutorials of one-hot encoding (or dummy coding) > (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm) > b. keep the indices unmodified in the output vector. If we drop the first, > all indices will be shifted by 1. > c. If users use StringIndex, the last element is the least frequent one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org