[
https://issues.apache.org/jira/browse/SPARK-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903180#comment-14903180
]
Seth Hendrickson commented on SPARK-7129:
-----------------------------------------
I had some time to give this topic some thought and started constructing [a
document with notes on a generic boosting
architecture|https://docs.google.com/document/d/1Zeoj99gwiJBF0JWL8170KicVB0U5xtUOk6VUeFj0Nz8/edit]
and some of the concerns it raises. I don't think this is acceptable as a
design doc because it's a bit wordy and it doesn't make an effort to follow the
structure of other design docs, but hopefully [~meihuawu] can find something
useful in it.
I found the [R mboost
vignette|https://cran.r-project.org/web/packages/mboost/vignettes/mboost_tutorial.pdf]
to be a good starting point. I'm still learning the ML package, but I'd love
to be involved in the discussion and potentially take on some of the code tasks
once we get there.
> Add generic boosting algorithm to spark.ml
> ------------------------------------------
>
> Key: SPARK-7129
> URL: https://issues.apache.org/jira/browse/SPARK-7129
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: Joseph K. Bradley
>
> The Pipelines API will make it easier to create a generic Boosting algorithm
> which can work with any Classifier or Regressor. Creating this feature will
> require researching the possible variants and extensions of boosting which we
> may want to support now and/or in the future, and planning an API which will
> be properly extensible.
> In particular, it will be important to think about supporting:
> * multiple loss functions (for AdaBoost, LogitBoost, gradient boosting, etc.)
> * multiclass variants
> * multilabel variants (which will probably be in a separate class and JIRA)
> * For more esoteric variants, we should consider them but not design too much
> around them: totally corrective boosting, cascaded models
> Note: This may interact some with the existing tree ensemble methods, but it
> should be largely separate since the tree ensemble APIs and implementations
> are specialized for trees.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]