[ 
https://issues.apache.org/jira/browse/SYSTEMML-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2420:
--------------------------------
    Description: It aims to implement the parameter exchange between ps and 
workers. We could leverage netty framework to implement our own Rpc framework. 
In general, the netty {{TransportClient}} and {{TransportServer}} provides the 
sending and receiving service for ps and workers. Extending the {{RpcHandler}} 
allows to invoke the corresponding ps method (i.e., push/pull method) by 
handling the different input Rpc call object. And then the {{SparkPsProxy}} 
wrapping {{TransportClient}} allows the workers to execute the push/pull call 
to server. At the same time, the ps netty server also provides the file 
repository service which allows the workers to download the partitioned 
training data, so that the workers could rebuild the matrix object with the 
transfered file instead of broadcasting all the files with spark which are not 
all necessary for each worker.  (was: It aims to implement the parameter 
exchange between ps and workers. We could leverage spark RPC to setup a ps 
endpoint in driver node which means that the ps service could be discovered by 
workers in the network. And then the workers could invoke the pull/push method 
via RPC using the registered endpoint of ps service. Hence, in details, this 
tasks consists of registering the ps endpoint in spark rpc framework and using 
rpc to invoke target method in worker side. We can learn that the spark rpc is 
implemented in Scala. Hence we need to wrap them in in order to be used in 
Java. Overall, we could register the ps service with _RpcEndpoint_ and invoke 
the service with _RpcEndpointRef_.)

> Communication between ps and workers
> ------------------------------------
>
>                 Key: SYSTEMML-2420
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2420
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>         Attachments: systemml_rpc_2_seq_diagram.png, 
> systemml_rpc_sequence_diagram.png
>
>
> It aims to implement the parameter exchange between ps and workers. We could 
> leverage netty framework to implement our own Rpc framework. In general, the 
> netty {{TransportClient}} and {{TransportServer}} provides the sending and 
> receiving service for ps and workers. Extending the {{RpcHandler}} allows to 
> invoke the corresponding ps method (i.e., push/pull method) by handling the 
> different input Rpc call object. And then the {{SparkPsProxy}} wrapping 
> {{TransportClient}} allows the workers to execute the push/pull call to 
> server. At the same time, the ps netty server also provides the file 
> repository service which allows the workers to download the partitioned 
> training data, so that the workers could rebuild the matrix object with the 
> transfered file instead of broadcasting all the files with spark which are 
> not all necessary for each worker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to