[ 
https://issues.apache.org/jira/browse/SPARK-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang updated SPARK-15957:
--------------------------------
    Description: 
RFormula will index label only when it is string type. If the label is numeric 
type and we use RFormula to present a classification model, we can not extract 
label attributes from the label column metadata successfully. The label 
attributes are useful when make prediction for classification, so we can force 
to index label by {{StringIndexer}} whether it is numeric or string type for 
classification. Then SparkR wrappers can extract label attributes from the 
column metadata successfully. This feature can help us to fix bug similar with 
SPARK-15153.
For regression, we will still to keep label as numeric type.
We should add a param to control whether to force to index label for RFormula.

  was:
RFormula will index label only when it is string type. If the label is numeric 
type and we use RFormula to present a classification model, we can not extract 
label attributes from the label column metadata successfully. The label 
attributes are useful when make prediction for classification, so we can force 
to index label by {StringIndexer} whether it is numeric or string type for 
classification. Then SparkR wrappers can extract label attributes from the 
column metadata successfully. This feature can help us to fix bug similar with 
SPARK-15153.
For regression, we will still to keep label as numeric type.
We should add a param to control whether to force to index label for RFormula.


> RFormula supports forcing to index label
> ----------------------------------------
>
>                 Key: SPARK-15957
>                 URL: https://issues.apache.org/jira/browse/SPARK-15957
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Yanbo Liang
>            Assignee: Yanbo Liang
>
> RFormula will index label only when it is string type. If the label is 
> numeric type and we use RFormula to present a classification model, we can 
> not extract label attributes from the label column metadata successfully. The 
> label attributes are useful when make prediction for classification, so we 
> can force to index label by {{StringIndexer}} whether it is numeric or string 
> type for classification. Then SparkR wrappers can extract label attributes 
> from the column metadata successfully. This feature can help us to fix bug 
> similar with SPARK-15153.
> For regression, we will still to keep label as numeric type.
> We should add a param to control whether to force to index label for RFormula.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to