[jira] [Work logged] (GOBBLIN-1781) Helix offline instance purging is not thread safe in the yarn service

ASF GitHub Bot (Jira) Mon, 13 Feb 2023 16:58:10 -0800


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-1781?focusedWorklogId=845250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-845250
 ]


ASF GitHub Bot logged work on GOBBLIN-1781:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Feb/23 00:57
            Start Date: 14/Feb/23 00:57
    Worklog Time Spent: 10m 
      Work Description: Will-Lo commented on code in PR #3638:
URL: https://github.com/apache/gobblin/pull/3638#discussion_r1105137753


##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnService.java:
##########
@@ -462,10 +464,16 @@ private EventSubmitter buildEventSubmitter() {
    *
    * @param yarnContainerRequestBundle the desired containers information, 
including numbers, resource and helix tag
    * @param inUseInstances  a set of in use instances
+   * @return whether the requestTargetNumberOfContainers function has executed 
yet

Review Comment:
   I would argue that the return value is whether or not the requested number 
of containers could be actually obtained after service initialization.  
describing the execution itself is ambiguous since it still executes when you 
return false





Issue Time Tracking
-------------------

    Worklog Id:     (was: 845250)
    Time Spent: 1.5h  (was: 1h 20m)

> Helix offline instance purging is not thread safe in the yarn service
> ---------------------------------------------------------------------
>
>                 Key: GOBBLIN-1781
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1781
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Andy Jiang
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Helix instances are purged during startup of the yarn service. This operation 
> must be done without new helix instances being added or removed (i.e. the API 
> call is not thread safe).
>  
> The current implementation blocks the yarn service from allocating initial 
> containers while the helix instance purging is enabled, but it does not 
> prevent other external services from requesting containers through its public 
> methods.
> These 2 services start up concurrently, and it's possible that the 
> AutoScalingYarnManager starts up before the Yarn Service is completely 
> finished purging. This means leads to the AutoScalingYarnManager to 
> requestContainers while the instances are still purging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (GOBBLIN-1781) Helix offline instance purging is not thread safe in the yarn service

Reply via email to