Xiangrui Meng created SPARK-5886:
------------------------------------

             Summary: Add LabelIndexer
                 Key: SPARK-5886
                 URL: https://issues.apache.org/jira/browse/SPARK-5886
             Project: Spark
          Issue Type: Sub-task
          Components: ML
            Reporter: Xiangrui Meng


`LabelIndexer` takes a column of labels (raw categories) and outputs an integer 
column with labels indexed by their frequency.

{code}
va li = new LabelIndexer()
  .setInputCol("country")
  .setOutputCol("countryIndex")
{code}

In the output column, we should store the label to index map as an ML 
attribute. The index should be ordered by frequency, where the most frequent 
label gets index 0, to enhance sparsity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to