[jira] [Commented] (SPARK-19747) Consolidate code in ML aggregators

Joseph K. Bradley (JIRA) Thu, 17 Aug 2017 14:38:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131340#comment-16131340
 ]


Joseph K. Bradley commented on SPARK-19747:
-------------------------------------------

Just saying: Thanks a lot for doing this reorg!  It's a nice step towards 
having pluggable algorithms.

> Consolidate code in ML aggregators
> ----------------------------------
>
>                 Key: SPARK-19747
>                 URL: https://issues.apache.org/jira/browse/SPARK-19747
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Seth Hendrickson
>            Priority: Minor
>
> Many algorithms in Spark ML are posed as optimization of a differentiable 
> loss function over a parameter vector. We implement these by having a loss 
> function accumulate the gradient using an Aggregator class which has methods 
> that amount to a {{seqOp}} and {{combOp}}. So, pretty much every algorithm 
> that obeys this form implements a cost function class and an aggregator 
> class, which are completely separate from one another but share probably 80% 
> of the same code. 
> I think it is important to clean things like this up, and if we can do it 
> properly it will make the code much more maintainable, readable, and bug 
> free. It will also help reduce the overhead of future implementations.
> The design is of course open for discussion, but I think we should aim to:
> 1. Have all aggregators share parent classes, so that they only need to 
> implement the {{add}} function. This is really the only difference in the 
> current aggregators.
> 2. Have a single, generic cost function that is parameterized by the 
> aggregator type. This reduces the many places we implement cost functions and 
> greatly reduces the amount of duplicated code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-19747) Consolidate code in ML aggregators

Reply via email to