[ 
https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903136#comment-14903136
 ] 

Joseph K. Bradley commented on SPARK-7129:
------------------------------------------

It's not really on the roadmap for 1.6, so I shouldn't make promises.  The main 
issue for me are some design questions:
* Should boosting depend on the prediction abstractions (Classifier, Regressor, 
etc.)?  If so, are those abstractions sufficient, or should they be turned into 
traits?

If you're interested, it would be valuable to get your input on designing the 
abstractions.  Would you be able to write a short design doc?  I figure we 
should:
* List the boosting algorithms of interest
* List what requirements those algorithms place on the base learner
* Design minimal abstractions which describe those requirements
* See how those abstractions compare with MLlib's current abstractions, and if 
we need to rethink them

If you have time for that, it'd be great if you could post it here as a Google 
doc or PDF to collect feedback.  Thanks!

> Add generic boosting algorithm to spark.ml
> ------------------------------------------
>
>                 Key: SPARK-7129
>                 URL: https://issues.apache.org/jira/browse/SPARK-7129
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>
> The Pipelines API will make it easier to create a generic Boosting algorithm 
> which can work with any Classifier or Regressor. Creating this feature will 
> require researching the possible variants and extensions of boosting which we 
> may want to support now and/or in the future, and planning an API which will 
> be properly extensible.
> In particular, it will be important to think about supporting:
> * multiple loss functions (for AdaBoost, LogitBoost, gradient boosting, etc.)
> * multiclass variants
> * multilabel variants (which will probably be in a separate class and JIRA)
> * For more esoteric variants, we should consider them but not design too much 
> around them: totally corrective boosting, cascaded models
> Note: This may interact some with the existing tree ensemble methods, but it 
> should be largely separate since the tree ensemble APIs and implementations 
> are specialized for trees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to