Andy Jiang created GOBBLIN-1781:
-----------------------------------
Summary: Helix offline instance purging is not thread safe in the
yarn service
Key: GOBBLIN-1781
URL: https://issues.apache.org/jira/browse/GOBBLIN-1781
Project: Apache Gobblin
Issue Type: Bug
Reporter: Andy Jiang
Helix instances are purged during startup of the yarn service. This operation
must be done without new helix instances being added or removed (i.e. the API
call is not thread safe).
The current implementation blocks the yarn service from allocating initial
containers while the helix instance purging is enabled, but it does not prevent
other external services from requesting containers through its public methods.
These 2 services start up concurrently, and it's possible that the
AutoScalingYarnManager starts up before the Yarn Service is completely finished
purging. This means leads to the AutoScalingYarnManager to requestContainers
while the instances are still purging.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)