[ 
https://issues.apache.org/jira/browse/SPARK-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-5886:
---------------------------------
    Description: 
`LabelIndexer` takes a column of labels (raw categories) and outputs an integer 
column with labels indexed by their frequency.

{code}
va li = new LabelIndexer()
  .setInputCol("country")
  .setOutputCol("countryIndex")
{code}

In the output column, we should store the label to index map as an ML 
attribute. The index should be ordered by frequency, where the most frequent 
label gets index 0, to enhance sparsity.

We can discuss whether this should index multiple columns at the same time.

  was:
`LabelIndexer` takes a column of labels (raw categories) and outputs an integer 
column with labels indexed by their frequency.

{code}
va li = new LabelIndexer()
  .setInputCol("country")
  .setOutputCol("countryIndex")
{code}

In the output column, we should store the label to index map as an ML 
attribute. The index should be ordered by frequency, where the most frequent 
label gets index 0, to enhance sparsity.


> Add LabelIndexer
> ----------------
>
>                 Key: SPARK-5886
>                 URL: https://issues.apache.org/jira/browse/SPARK-5886
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Xiangrui Meng
>
> `LabelIndexer` takes a column of labels (raw categories) and outputs an 
> integer column with labels indexed by their frequency.
> {code}
> va li = new LabelIndexer()
>   .setInputCol("country")
>   .setOutputCol("countryIndex")
> {code}
> In the output column, we should store the label to index map as an ML 
> attribute. The index should be ordered by frequency, where the most frequent 
> label gets index 0, to enhance sparsity.
> We can discuss whether this should index multiple columns at the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to