huijunwu opened a new pull request #3162: keep executor running when `heron update` adds containers URL: https://github.com/apache/incubator-heron/pull/3162 The present `heron update` scale process: 1. deactivate topology 2. call `scheduler add containers`, which returns the new added container id 3. update the repacking plan (replace the assumed container id with actual container id) 4. send repacking plan to zk 5. the new container heron-executor gets the new repacking plan and update its heron-instances. Observed race condition In the log, found that the step 2 never returned, then the process timeout. In the step 2, `scheduler add containers` tried to check the new container `wait-until running`. However, the new container heron-executor assertion failure and scheduler thought `--wait-until running` condition was not satisfied, and still in `waiting` status. On the other hand, if the step 4 is not performed, the new container heron-executor cannot recover. thus step 2 depends on step 4, while step 4 depends on step 2, which is race condition. This PR let step 2 return by letting executor runs healthy
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
