[ 
https://issues.apache.org/jira/browse/SPARK-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344995#comment-15344995
 ] 

Kai Jiang commented on SPARK-15767:
-----------------------------------

rpart API:
rpart(formula, data, weights, subset, na.action = na.rpart, method,
      model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)
http://www.inside-r.org/r-doc/rpart/rpart

Comparison between MLlib DecisionTree API and rpart API:
1. algorithm
MLlib Use `Classifier` or `Regressor` to tell algorithm
rpart API use argument `method` to specify algorithm, "anova" represents 
`regression` and "class" represents `classification`

2. parameters
rpart uses `control` to pass in.  minsplit, minbucket, cp (complexity 
parameter), maxcompete (the number of competitor splits retained in the output) 
maxsurrogate(the number of surrogate splits retained in the output.) 
usesurrogate, xval(number of cross-validations), surrogatestyle (controls the 
selection of a best surrogate), maxdepth
MLlib API:.maxDepth, maxBins, minInstancesPerNode, minInfoGain
Thus, most are same.

3. summary
In this part, we could just mimic what rpart do in R.

So, I think we should mimic rpart API.

> Decision Tree Regression wrapper in SparkR
> ------------------------------------------
>
>                 Key: SPARK-15767
>                 URL: https://issues.apache.org/jira/browse/SPARK-15767
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML, SparkR
>            Reporter: Kai Jiang
>            Assignee: Kai Jiang
>
> Implement a wrapper in SparkR to support decision tree regression. R's naive 
> Decision Tree Regression implementation is from package rpart with signature 
> rpart(formula, dataframe, method="anova"). I propose we could implement API 
> like spark.rpart(dataframe, formula, ...) .  After having implemented 
> decision tree classification, we could refactor this two into an API more 
> like rpart()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to