[
https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15942498#comment-15942498
]
Joseph K. Bradley commented on SPARK-6567:
------------------------------------------
Linking [SPARK-10078], which tracks adding vector-free L-BFGS.
> Large linear model parallelism via a join and reduceByKey
> ---------------------------------------------------------
>
> Key: SPARK-6567
> URL: https://issues.apache.org/jira/browse/SPARK-6567
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib
> Reporter: Reza Zadeh
> Attachments: model-parallelism.pptx
>
>
> To train a linear model, each training point in the training set needs its
> dot product computed against the model, per iteration. If the model is large
> (too large to fit in memory on a single machine) then SPARK-4590 proposes
> using parameter server.
> There is an easier way to achieve this without parameter servers. In
> particular, if the data is held as a BlockMatrix and the model as an RDD,
> then each block can be joined with the relevant part of the model, followed
> by a reduceByKey to compute the dot products.
> This obviates the need for a parameter server, at least for linear models.
> However, it's unclear how it compares performance-wise to parameter servers.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]