[
https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131340#comment-16131340
]
Joseph K. Bradley commented on SPARK-19747:
-------------------------------------------
Just saying: Thanks a lot for doing this reorg! It's a nice step towards
having pluggable algorithms.
> Consolidate code in ML aggregators
> ----------------------------------
>
> Key: SPARK-19747
> URL: https://issues.apache.org/jira/browse/SPARK-19747
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Seth Hendrickson
> Priority: Minor
>
> Many algorithms in Spark ML are posed as optimization of a differentiable
> loss function over a parameter vector. We implement these by having a loss
> function accumulate the gradient using an Aggregator class which has methods
> that amount to a {{seqOp}} and {{combOp}}. So, pretty much every algorithm
> that obeys this form implements a cost function class and an aggregator
> class, which are completely separate from one another but share probably 80%
> of the same code.
> I think it is important to clean things like this up, and if we can do it
> properly it will make the code much more maintainable, readable, and bug
> free. It will also help reduce the overhead of future implementations.
> The design is of course open for discussion, but I think we should aim to:
> 1. Have all aggregators share parent classes, so that they only need to
> implement the {{add}} function. This is really the only difference in the
> current aggregators.
> 2. Have a single, generic cost function that is parameterized by the
> aggregator type. This reduces the many places we implement cost functions and
> greatly reduces the amount of duplicated code.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]