[ 
https://issues.apache.org/jira/browse/SPARK-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497511#comment-14497511
 ] 

Reza Zadeh commented on SPARK-4590:
-----------------------------------

I agree IndexedRDD is not the best way forward as it won't have the desired 
throughput.

It is worth mentioning that at least for linear models, we have figured out how 
to train without the need for a parameter server. See SPARK-6567

We are currently leaning towards doing option (1) with some default 
implementation, but first we should evaluate how far we can get without 
parameter servers. Most of our needs could be satisfied with SPARK-6567, at a 
fraction of the infrastructure building cost.

> Early investigation of parameter server
> ---------------------------------------
>
>                 Key: SPARK-4590
>                 URL: https://issues.apache.org/jira/browse/SPARK-4590
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML, MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Reza Zadeh
>
> In the currently implementation of GLM solvers, we save intermediate models 
> on the driver node and update it through broadcast and aggregation. Even with 
> torrent broadcast and tree aggregation added in 1.1, it is hard to go beyond 
> ~10 million features. This JIRA is for investigating the parameter server 
> approach, including algorithm, infrastructure, and dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to