wangwei created SINGA-3:
---------------------------

             Summary: Use Zookeeper to check stopping (finish) time of the 
system
                 Key: SINGA-3
                 URL: https://issues.apache.org/jira/browse/SINGA-3
             Project: Singa
          Issue Type: New Feature
         Environment: Linux, gcc>4.8
            Reporter: wangwei


To stop each process (node), we need to stop both its local workers and 
servers. For worker threads, they will exit when they finish all training 
steps. For server threads, they can exit only when all connected workers have 
stopped. 

We use Zookeeper to detect the worker state. In specific, the main thread of 
each process registers all local servers firstly to the Zookeeper. Then it 
registers each worker to a dedicated server group, where its parameters are 
maintained. When one worker finishes execution, it de-register from the server 
group (folder) in the Zookeeper and tells the main thread about its state. When 
all workers registered in one server group finish, the callback function 
registered for server group will send a stop message to him. The server tells 
the main thread about its state and stops upon receiving this message. Once all 
local workers and local servers finish, the main thread exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to