[ https://issues.apache.org/jira/browse/GOBBLIN-1624?focusedWorklogId=749037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-749037 ]
ASF GitHub Bot logged work on GOBBLIN-1624: ------------------------------------------- Author: ASF GitHub Bot Created on: 29/Mar/22 00:36 Start Date: 29/Mar/22 00:36 Worklog Time Spent: 10m Work Description: umustafi commented on a change in pull request #3481: URL: https://github.com/apache/gobblin/pull/3481#discussion_r836957917 ########## File path: gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagManager.java ########## @@ -382,12 +365,31 @@ public synchronized void setActive(boolean active) { ServiceMetricNames.FAILED_FLOW_METER)); } + GobblinServiceQuotaManager quotaManager = new GobblinServiceQuotaManager(config); + // Before initializing the DagManagerThreads check which dags are currently running before shutdown + Set<String> runningDags = new ConcurrentHashSet<>(); + for (Dag<JobExecutionPlan> dag: dagStateStore.getDags()) { + for (DagNode<JobExecutionPlan> dagNode: dag.getNodes()) { + if (DagManagerUtils.getExecutionStatus(dagNode) == RUNNING) { + runningDags.add(DagManagerUtils.generateDagId(dagNode)); + // Add all the currently running Dags to the quota limit per user + try { + quotaManager.checkQuota(dagNode); + } catch (IOException e) { + // Quota is somehow exceeded with currently running jobs, we should never hit this state normally + // but we should avoid stalling the entire service + log.error(String.format("Quota exceeded during initialization in DagManager for job name: %s", Review comment: this is exceeding quota of entire dag manager not per user quota right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 749037) Time Spent: 1h (was: 50m) > Gobblin as a Service does not emit correct running job metrics and quotas in > some edge cases > -------------------------------------------------------------------------------------------- > > Key: GOBBLIN-1624 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1624 > Project: Apache Gobblin > Issue Type: Task > Reporter: William Lo > Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > With the DagManager class in GaaS, during rollout/leader swap it is possible > to get an inaccurate count of running jobs emitted, and quotas for these > running jobs. > For example, if the leader is shut down while keeping track of 10 running > jobs, and during restart 5 of these jobs completed, the leader would emit > that 0 jobs are currently running since it would not treat the job counters > as idempotent. Additionally, we over-decrement due to not differentiating > jobs running on the executor that fail, vs jobs that fail on the GaaS side. > We should keep track of currently running jobs better to ensure that we only > decrement counters/quotas for jobs that are actually running on the executor > and track better between startup. -- This message was sent by Atlassian Jira (v8.20.1#820001)