[
https://issues.apache.org/jira/browse/SPARK-20619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wayne Zhang updated SPARK-20619:
--------------------------------
Description:
StringIndexer maps labels to numbers according to the descending order of label
frequency. Other types of ordering (e.g., alphabetical) may be needed in
feature ETL. For example, the ordering will affect the result in one-hot
encoding and RFormula. Propose to support other ordering methods and we add a
parameter stringOrderType that supports the following four options:
- 'freq_desc': descending order by label frequency (most frequent label
assigned 0)
- 'freq_asc': ascending order by label frequency (least frequent label
assigned 0)
- 'alphabet_desc': descending alphabetical order
- 'alphabet_asc': ascending alphabetical order
was:
StringIndexer maps labels to numbers according to the descending order of label
frequency. Other types of ordering (e.g., alphabetical) may be needed in
feature ETL, for example, in one-hot encoding. Propose to support alphabetic
order, and ascending order of label frequency. For example, add a parameter
stringOrderType to control how string is ordered which supports four options:
- 'freq_desc': descending order by label frequency (most frequent label
assigned 0)
- 'freq_asc': ascending order by label frequency (least frequent label
assigned 0)
- 'alphabet_desc': descending alphabetical order
- 'alphabet_asc': ascending alphabetical order
> StringIndexer supports multiple ways of label ordering
> ------------------------------------------------------
>
> Key: SPARK-20619
> URL: https://issues.apache.org/jira/browse/SPARK-20619
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.1.0
> Reporter: Wayne Zhang
>
> StringIndexer maps labels to numbers according to the descending order of
> label frequency. Other types of ordering (e.g., alphabetical) may be needed
> in feature ETL. For example, the ordering will affect the result in one-hot
> encoding and RFormula. Propose to support other ordering methods and we add a
> parameter stringOrderType that supports the following four options:
> - 'freq_desc': descending order by label frequency (most frequent label
> assigned 0)
> - 'freq_asc': ascending order by label frequency (least frequent label
> assigned 0)
> - 'alphabet_desc': descending alphabetical order
> - 'alphabet_asc': ascending alphabetical order
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]