[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

Shivaram Venkataraman (JIRA) Tue, 21 Jul 2015 13:54:04 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635796#comment-14635796
 ]


Shivaram Venkataraman commented on SPARK-9230:
----------------------------------------------

The thing to do there would be to capture it as SparkR DataFrame columns. so 
df$Sepal_Width actually resolves to a Java column class and then we can parse 
those in RFormula -- So in some sense we'll have two constructors, one from 
strings and one from DataFrame columns.

> SparkR RFormula should support StringType features
> --------------------------------------------------
>
>                 Key: SPARK-9230
>                 URL: https://issues.apache.org/jira/browse/SPARK-9230
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SparkR
>            Reporter: Eric Liang
>
> StringType features will need to be encoded using OneHotEncoder to be used 
> for regression. See umbrella design doc 
> https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

Reply via email to