[
https://issues.apache.org/jira/browse/GOBBLIN-1797?focusedWorklogId=850148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-850148
]
ASF GitHub Bot logged work on GOBBLIN-1797:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 09/Mar/23 19:01
Start Date: 09/Mar/23 19:01
Worklog Time Spent: 10m
Work Description: umustafi commented on code in PR #3656:
URL: https://github.com/apache/gobblin/pull/3656#discussion_r1131470132
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java:
##########
@@ -292,17 +355,23 @@ private void scheduleSpecsFromCatalog() {
Spec spec = batchOfSpecsIterator.next();
try {
addSpecHelperMethod(spec);
- urisLeftToSchedule.remove(spec.getUri());
+ totalAddSpecTime += this.eachCompleteAddSpecValue; // this is
updated by each call to onAddSpec
+ actualNumFlowsScheduled += 1;
} catch (Exception e) {
// If there is an uncaught error thrown during compilation, log it
and continue adding flows
_log.error("Could not schedule spec {} from flowCatalog due to ",
spec, e);
}
+ urisLeftToSchedule.remove(spec.getUri());
}
startOffset += this.loadSpecsBatchSize;
- // This count is used to ensure the average spec get time is calculated
accurately for the last batch which may be
- // smaller than the loadSpecsBatchSize
- averageGetSpecTimeValue = (batchGetEndTime - batchGetStartTime) /
batchOfSpecs.size();
+ totalGetTime += batchGetEndTime - batchGetStartTime;
+ // Don't skew the average get spec time value with the last batch that
may be very small
+ if (batchOfSpecs.size() == this.loadSpecsBatchSize) {
Review Comment:
These are good call outs, in practice only the last batch size I expect to
fall short of batchSize but it's better to be more robust in case the batch
size was set extremely large and covered all specs.
Issue Time Tracking
-------------------
Worklog Id: (was: 850148)
Time Spent: 1h 40m (was: 1.5h)
> Skip scheduling flows far into future
> -------------------------------------
>
> Key: GOBBLIN-1797
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1797
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-service
> Reporter: Urmi Mustafi
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> The unschedule feature linked below sets a schedule to run Jan 1st of 2050 so
> far in advance that it will "never run"
> [https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#62]
> but potentially there are over 100k of these flows so we are loading and
> scheduling many unnecessary flows. On initialization we add a check that
> verifies the next run of the flow is within a certain time frame (100 days by
> default) and loads it into the scheduler if it is within that time frame. We
> choose that default value under the assumption that we will redeploy GaaS at
> least every 100 days and then if we approach a far out scheduled flow we will
> load it into the Scheduler. However, in most cases uses schedule flows for
> near future or immediately and those will all be scheduled. This PR also
> renames metrics and adds helpful new ones.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)