[jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey

Joseph K. Bradley (JIRA) Sun, 26 Mar 2017 17:02:32 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942498#comment-15942498
 ]


Joseph K. Bradley commented on SPARK-6567:
------------------------------------------

Linking [SPARK-10078], which tracks adding vector-free L-BFGS.

> Large linear model parallelism via a join and reduceByKey
> ---------------------------------------------------------
>
>                 Key: SPARK-6567
>                 URL: https://issues.apache.org/jira/browse/SPARK-6567
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Reza Zadeh
>         Attachments: model-parallelism.pptx
>
>
> To train a linear model, each training point in the training set needs its 
> dot product computed against the model, per iteration. If the model is large 
> (too large to fit in memory on a single machine) then SPARK-4590 proposes 
> using parameter server.
> There is an easier way to achieve this without parameter servers. In 
> particular, if the data is held as a BlockMatrix and the model as an RDD, 
> then each block can be joined with the relevant part of the model, followed 
> by a reduceByKey to compute the dot products.
> This obviates the need for a parameter server, at least for linear models. 
> However, it's unclear how it compares performance-wise to parameter servers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey

Reply via email to