[
https://issues.apache.org/jira/browse/SPARK-20619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Felix Cheung resolved SPARK-20619.
----------------------------------
Resolution: Fixed
Assignee: Wayne Zhang
Fix Version/s: 2.3.0
Target Version/s: 2.3.0
> StringIndexer supports multiple ways of label ordering
> ------------------------------------------------------
>
> Key: SPARK-20619
> URL: https://issues.apache.org/jira/browse/SPARK-20619
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.1.0
> Reporter: Wayne Zhang
> Assignee: Wayne Zhang
> Fix For: 2.3.0
>
>
> StringIndexer maps labels to numbers according to the descending order of
> label frequency. Other types of ordering (e.g., alphabetical) may be needed
> in feature ETL. For example, the ordering will affect the result in one-hot
> encoding and RFormula. Propose to support other ordering methods and we add a
> parameter stringOrderType that supports the following four options:
> - 'freq_desc': descending order by label frequency (most frequent label
> assigned 0)
> - 'freq_asc': ascending order by label frequency (least frequent label
> assigned 0)
> - 'alphabet_desc': descending alphabetical order
> - 'alphabet_asc': ascending alphabetical order
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]