[
https://issues.apache.org/jira/browse/SYSTEMML-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LI Guobao updated SYSTEMML-2420:
--------------------------------
Description: It aims to implement the parameter exchange between ps and
workers. We could leverage netty framework to implement our own Rpc framework.
In general, the netty {{TransportClient}} and {{TransportServer}} provides the
sending and receiving service for ps and workers. Extending the {{RpcHandler}}
allows to invoke the corresponding ps method (i.e., push/pull method) by
handling the different input Rpc call object. And then the {{SparkPsProxy}}
wrapping {{TransportClient}} allows the workers to execute the push/pull call
to server. At the same time, the ps netty server also provides the file
repository service which allows the workers to download the partitioned
training data, so that the workers could rebuild the matrix object with the
transfered file instead of broadcasting all the files with spark which are not
all necessary for each worker. (was: It aims to implement the parameter
exchange between ps and workers. We could leverage spark RPC to setup a ps
endpoint in driver node which means that the ps service could be discovered by
workers in the network. And then the workers could invoke the pull/push method
via RPC using the registered endpoint of ps service. Hence, in details, this
tasks consists of registering the ps endpoint in spark rpc framework and using
rpc to invoke target method in worker side. We can learn that the spark rpc is
implemented in Scala. Hence we need to wrap them in in order to be used in
Java. Overall, we could register the ps service with _RpcEndpoint_ and invoke
the service with _RpcEndpointRef_.)
> Communication between ps and workers
> ------------------------------------
>
> Key: SYSTEMML-2420
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2420
> Project: SystemML
> Issue Type: Sub-task
> Reporter: LI Guobao
> Assignee: LI Guobao
> Priority: Major
> Attachments: systemml_rpc_2_seq_diagram.png,
> systemml_rpc_sequence_diagram.png
>
>
> It aims to implement the parameter exchange between ps and workers. We could
> leverage netty framework to implement our own Rpc framework. In general, the
> netty {{TransportClient}} and {{TransportServer}} provides the sending and
> receiving service for ps and workers. Extending the {{RpcHandler}} allows to
> invoke the corresponding ps method (i.e., push/pull method) by handling the
> different input Rpc call object. And then the {{SparkPsProxy}} wrapping
> {{TransportClient}} allows the workers to execute the push/pull call to
> server. At the same time, the ps netty server also provides the file
> repository service which allows the workers to download the partitioned
> training data, so that the workers could rebuild the matrix object with the
> transfered file instead of broadcasting all the files with spark which are
> not all necessary for each worker.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)