huijunwu opened a new pull request #3162: keep executor running when `heron 
update` adds containers
URL: https://github.com/apache/incubator-heron/pull/3162
 
 
   The present `heron update` scale process:
   1. deactivate topology
   2. call `scheduler add containers`, which returns the new added container id
   3. update the repacking plan (replace the assumed container id with actual 
container id)
   4. send repacking plan to zk
   5. the new container heron-executor gets the new repacking plan and update 
its heron-instances.
   
   Observed race condition
   In the log, found that the step 2 never returned, then the process timeout.
   In the step 2, `scheduler add containers` tried to check the new container 
`wait-until running`. However, the new container heron-executor assertion 
failure and scheduler thought `--wait-until running` condition was not 
satisfied, and still in `waiting` status.
   On the other hand, if the step 4 is not performed, the new container 
heron-executor cannot recover.
   thus step 2 depends on step 4, while step 4 depends on step 2, which is race 
condition.
   
   This PR let step 2 return by letting executor runs healthy

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to