[ 
https://issues.apache.org/jira/browse/SINGA-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562217#comment-14562217
 ] 

Sheng Wang commented on SINGA-3:
--------------------------------

Pull Request #3 has been merged to add this feature.

> Use Zookeeper to check stopping (finish) time of the system
> -----------------------------------------------------------
>
>                 Key: SINGA-3
>                 URL: https://issues.apache.org/jira/browse/SINGA-3
>             Project: Singa
>          Issue Type: New Feature
>         Environment: Linux, gcc>4.8
>            Reporter: wangwei
>
> To stop each process (node), we need to stop both its local workers and 
> servers. For worker threads, they will exit when they finish all training 
> steps. For server threads, they can exit only when all connected workers have 
> stopped. 
> We use Zookeeper to detect the worker state. In specific, the main thread of 
> each process registers all local servers firstly to the Zookeeper. Then it 
> registers each worker to a dedicated server group, where its parameters are 
> maintained. When one worker finishes execution, it de-register from the 
> server group (folder) in the Zookeeper and tells the main thread about its 
> state. When all workers registered in one server group finish, the callback 
> function registered for server group will send a stop message to him. The 
> server tells the main thread about its state and stops upon receiving this 
> message. Once all local workers and local servers finish, the main thread 
> exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to