[ 
https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371816#comment-15371816
 ] 

Ben McCann commented on SPARK-6567:
-----------------------------------

[~hucheng] can you share your code for this?

> Large linear model parallelism via a join and reduceByKey
> ---------------------------------------------------------
>
>                 Key: SPARK-6567
>                 URL: https://issues.apache.org/jira/browse/SPARK-6567
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Reza Zadeh
>         Attachments: model-parallelism.pptx
>
>
> To train a linear model, each training point in the training set needs its 
> dot product computed against the model, per iteration. If the model is large 
> (too large to fit in memory on a single machine) then SPARK-4590 proposes 
> using parameter server.
> There is an easier way to achieve this without parameter servers. In 
> particular, if the data is held as a BlockMatrix and the model as an RDD, 
> then each block can be joined with the relevant part of the model, followed 
> by a reduceByKey to compute the dot products.
> This obviates the need for a parameter server, at least for linear models. 
> However, it's unclear how it compares performance-wise to parameter servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to