[
https://issues.apache.org/jira/browse/GOBBLIN-1797?focusedWorklogId=849947&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-849947
]
ASF GitHub Bot logged work on GOBBLIN-1797:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 09/Mar/23 02:13
Start Date: 09/Mar/23 02:13
Worklog Time Spent: 10m
Work Description: phet commented on code in PR #3656:
URL: https://github.com/apache/gobblin/pull/3656#discussion_r1130303841
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java:
##########
@@ -312,15 +380,21 @@ private void scheduleSpecsFromCatalog() {
try {
individualGetSpecStartTime = System.nanoTime();
Spec spec = this.flowCatalog.get().getSpecWrapper(uri);
- this.individualGetSpecSpeedNanosValue = System.nanoTime() -
individualGetSpecStartTime;
+ this.individualGetSpecSpeedValue = System.nanoTime() -
individualGetSpecStartTime;
+ totalGetTime += this.individualGetSpecSpeedValue;
addSpecHelperMethod(spec);
+ totalAddSpecTime += this.eachCompleteAddSpecValue; // this is
updated by each call to onAddSpec
+ actualNumFlowsScheduled += 1;
} catch (Exception e) {
// If there is an uncaught error thrown during compilation, log it
and continue adding flows
_log.error("Could not schedule spec uri {} from flowCatalog due to
", uri, e);
}
-
}
+ this.individualGetSpecSpeedValue = -1L;
+ this.totalGetSpecTimeValue = totalGetTime;
+ this.totalAddSpecTimeValue = totalAddSpecTime;
Review Comment:
I see your point on the logging, but what I like about emitting as a metric:
a. the value is preserved for just as long as other metrics are (logs
generally retained shorter)
b. easy to align timeseries for this metric along w/ the other related ones
c. this value can be found in the same place as the other metrics we're
looking at (no need to pause on reading the other metrics to go fetch this
value from the logs)
Issue Time Tracking
-------------------
Worklog Id: (was: 849947)
Time Spent: 1h 20m (was: 1h 10m)
> Skip scheduling flows far into future
> -------------------------------------
>
> Key: GOBBLIN-1797
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1797
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-service
> Reporter: Urmi Mustafi
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> The unschedule feature linked below sets a schedule to run Jan 1st of 2050 so
> far in advance that it will "never run"
> [https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#62]
> but potentially there are over 100k of these flows so we are loading and
> scheduling many unnecessary flows. On initialization we add a check that
> verifies the next run of the flow is within a certain time frame (100 days by
> default) and loads it into the scheduler if it is within that time frame. We
> choose that default value under the assumption that we will redeploy GaaS at
> least every 100 days and then if we approach a far out scheduled flow we will
> load it into the Scheduler. However, in most cases uses schedule flows for
> near future or immediately and those will all be scheduled. This PR also
> renames metrics and adds helpful new ones.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)