[ 
https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262991#comment-15262991
 ] 

Seth Hendrickson commented on SPARK-7129:
-----------------------------------------

Creating this initially as a Spark package is not a bad idea, but in this 
particular case there are a couple of things standing in the way. First, we 
need weighted support from base classifiers, currently which only exists for 
LogisticRegression. I have a PR ready for trees, so they should not be far off. 
The second is that AdaBoost, for example, would predict by aggregating 
predictions of individual models in the ensemble. Spark ML models have 
protected predict functions, so other classes cannot access them, let alone 
outside of Spark entirely. [There is a 
Jira|https://issues.apache.org/jira/browse/SPARK-10413] for making predict 
methods public, so this may need to wait on that. If this boosting framework 
went directly into Spark, we could make the methods private[ml]. I appreciate 
any feedback, and I'll try to work on these blocking issues for now.

> Add generic boosting algorithm to spark.ml
> ------------------------------------------
>
>                 Key: SPARK-7129
>                 URL: https://issues.apache.org/jira/browse/SPARK-7129
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> The Pipelines API will make it easier to create a generic Boosting algorithm 
> which can work with any Classifier or Regressor. Creating this feature will 
> require researching the possible variants and extensions of boosting which we 
> may want to support now and/or in the future, and planning an API which will 
> be properly extensible.
> In particular, it will be important to think about supporting:
> * multiple loss functions (for AdaBoost, LogitBoost, gradient boosting, etc.)
> * multiclass variants
> * multilabel variants (which will probably be in a separate class and JIRA)
> * For more esoteric variants, we should consider them but not design too much 
> around them: totally corrective boosting, cascaded models
> Note: This may interact some with the existing tree ensemble methods, but it 
> should be largely separate since the tree ensemble APIs and implementations 
> are specialized for trees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to