Technoboy- opened a new issue #1658: Refactor WorkerServer
URL: https://github.com/apache/incubator-dolphinscheduler/issues/1658
 
 
   # Background
     WorkerServer executes task by scanning ZK and DB. When WorkerServer 
starts, it try to retrive the lock in zk, and then executes task by loading 
data from DB. This is not nice for distributing system, and the current 
implementation will result in delay executing task.
   
   # Suggestion
     We wanna use tcp channel to refactor WorkerServer. 
    
   # General Implementation Idea
    1. Using Netty for our tcp framework. 
    2. MasterServer keeps the current logic and when it picks a task, directly 
sends it to target worker using RoundRobin policy.
    3. WorkerServer will start up as predefined group and register itself to zk 
node.
    4. WorkerServer will start a tcp server listening port for executing task 
instead of scanning ZK and DB.
    5. Executing result will send back to the MasterServer node using the 
previous channel.
    
   
   # General Failover Idea
     1. For WorkerServer, only it receives the task command and gives back the 
ack command to keep the task is acknowledged.
     2. If the WorkerServer executes the task normally, it will send back the 
result by the previous channel.
     3. If the WorkerServer died after receiving a task, MasterServer will use 
execution-timeout time to ping WorkerServer to detect liveness. If ping failed, 
try another worker node. In this case, task may execute more than once.
     4. If the MasterServer died after sending out the a task, WorkerServer 
will retry to rebuild the channel with N times to the original MasterServer. If 
failed after retry times, choose a new MasterServer to send back the result.  
New MasterServer will analysis the task, decide the next process. (Stop or 
continue execute by instanceId/processId, or just update the status)
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to