[
https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334721#comment-15334721
]
Joseph K. Bradley commented on SPARK-15767:
-------------------------------------------
[~vectorijk] Notes from sync: Can you please write more about the possible
APIs? I'd like to do a comparison of:
* the rpart API
* the MLlib DecisionTreeClassifier and DecisionTreeRegressor APIs
The comparison should list all parameters and their meaning. The idea is to
figure out which of the following we can do:
* Best option: Mimic rpart exactly so that R users can switch to spark.rpart
easily
* Worst option: Sort of mimic rpart, but not exactly because of a difference in
functionality, such as new parameters from MLlib or differences in behavior.
* Medium option: Avoid rpart API, and instead offer APIs matching
DecisionTreeClassifier and DecisionTreeRegressor in the Scala/Java/Python APIs
> Decision Tree Regression wrapper in SparkR
> ------------------------------------------
>
> Key: SPARK-15767
> URL: https://issues.apache.org/jira/browse/SPARK-15767
> Project: Spark
> Issue Type: New Feature
> Components: ML, SparkR
> Reporter: Kai Jiang
> Assignee: Kai Jiang
>
> Implement a wrapper in SparkR to support decision tree regression. R's naive
> Decision Tree Regression implementation is from package rpart with signature
> rpart(formula, dataframe, method="anova"). I propose we could implement API
> like spark.decisionTreeRegression(dataframe, formula, ...) . After having
> implemented decision tree classification, we could refactor this two into an
> API more like rpart()
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]