[ 
https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908455#comment-14908455
 ] 

Seth Hendrickson commented on SPARK-7129:
-----------------------------------------

A couple of quick comments I have:
* The design doc implies that we will have several different boosting 
predictors, whereas I initially thought this JIRA called for a single generic 
boosting predictor. So it seems like we'll have {{AdaBoostClassifier}}, 
{{LogitBoostClassifier}}, {{GradientBoostClassifier}} all separate boosting 
implementations instead of a single {{BoostedClassifier}} implementation that 
has a param like {{setAlgo("AdaBoost")}}. Personally think that a single 
generic implementation doesn't make as much sense, and so I like the separation 
of different algorithms better, but I wanted to clarify.
* What are the base learners in the design doc? It looks like you propose to 
create a new {{Learner}} class. How will that interact with existing 
predictors? 
* I think {{AdaBoostClassifier}} is better than {{SAMMEClassifier}} since it is 
the classification analogy of {{AdaBoostRegressor}}, plus we'll keep in line 
with the sci-kit api. 
* Is {{setNumberOfBaseLearners}} equivalent to setting the number of boosting 
iterations? I ask because in R mboost package, they accept a set of P candidate 
base learners where, at each boosting iteration, they train each one and select 
only the "best" base learner. If this were the case, we would want to allow the 
user to specify multiple base learners. It seems as if we will not be doing 
that under the proposed architecture. Just want to clarify

> Add generic boosting algorithm to spark.ml
> ------------------------------------------
>
>                 Key: SPARK-7129
>                 URL: https://issues.apache.org/jira/browse/SPARK-7129
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> The Pipelines API will make it easier to create a generic Boosting algorithm 
> which can work with any Classifier or Regressor. Creating this feature will 
> require researching the possible variants and extensions of boosting which we 
> may want to support now and/or in the future, and planning an API which will 
> be properly extensible.
> In particular, it will be important to think about supporting:
> * multiple loss functions (for AdaBoost, LogitBoost, gradient boosting, etc.)
> * multiclass variants
> * multilabel variants (which will probably be in a separate class and JIRA)
> * For more esoteric variants, we should consider them but not design too much 
> around them: totally corrective boosting, cascaded models
> Note: This may interact some with the existing tree ensemble methods, but it 
> should be largely separate since the tree ensemble APIs and implementations 
> are specialized for trees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to