[ 
https://issues.apache.org/jira/browse/SINGA-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745226#comment-14745226
 ] 

wangwei commented on SINGA-57:
------------------------------

an example cluster topology of distributed hogwild is

{code}
nworker_groups: 2
nserver_groups: 2
{code}

Each server group serves one worker group. One worker group and one server 
group will be created in one processes (two processes will be launched in 
total).


> Improve Distributed Hogwild
> ---------------------------
>
>                 Key: SINGA-57
>                 URL: https://issues.apache.org/jira/browse/SINGA-57
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>
> The implementation SINGA-8 of distributed Hogwild uses the stub thread to 
> monitor the network bandwidth. When the network has >0 bandwidth, the stub 
> sends a sync reminder msg to a server, which would trigger the server to sync 
> one param slice with other server groups.
> The code is messy due to the monitoring of network bandwidth and processing 
> the sync reminder message. Another problem is that the  reminder message may 
> not be generated frequently. Because it is generated only when the router 
> times out. If the worker and server run very fast that the router rarely 
> times out, then the sync reminder message cannot be sent. In contrast, if the 
> router times out frequently, many reminder messages would be generated.
> This ticket improves the implementation by fixing the frequency of 
> synchronization between server groups. A server sends a sync message for a 
> Param (slice) for every sync_freq updates to the server group that 
> masters/maintains the Param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to