Reza Zadeh created SPARK-6567:
---------------------------------

             Summary: Large linear model parallelism via a join and reduceByKey
                 Key: SPARK-6567
                 URL: https://issues.apache.org/jira/browse/SPARK-6567
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Reza Zadeh


To train a linear model, each training point in the training set needs its dot 
product computed against the model, per iteration. If the model is large (too 
large to fit in memory on a single machine) then SPARK-4590 proposes using 
parameter server.

There is an easier way to achieve this without parameter servers. In 
particular, if the data is held as a BlockMatrix and the model as an RDD, then 
each block can be joined with the relevant part of the model, followed by a 
reduceByKey to compute the dot products.

This obviates the need for a parameter server, at least for linear models. 
However, it's unclear how it compares performance-wise to parameter servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to