Xiangrui Meng created SPARK-5886:
------------------------------------
Summary: Add LabelIndexer
Key: SPARK-5886
URL: https://issues.apache.org/jira/browse/SPARK-5886
Project: Spark
Issue Type: Sub-task
Components: ML
Reporter: Xiangrui Meng
`LabelIndexer` takes a column of labels (raw categories) and outputs an integer
column with labels indexed by their frequency.
{code}
va li = new LabelIndexer()
.setInputCol("country")
.setOutputCol("countryIndex")
{code}
In the output column, we should store the label to index map as an ML
attribute. The index should be ordered by frequency, where the most frequent
label gets index 0, to enhance sparsity.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]