[ 
https://issues.apache.org/jira/browse/GOBBLIN-1797?focusedWorklogId=850141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-850141
 ]

ASF GitHub Bot logged work on GOBBLIN-1797:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Mar/23 18:30
            Start Date: 09/Mar/23 18:30
    Worklog Time Spent: 10m 
      Work Description: umustafi commented on code in PR #3656:
URL: https://github.com/apache/gobblin/pull/3656#discussion_r1131439428


##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java:
##########
@@ -231,17 +247,59 @@ public void run() {
     }
   }
 
-  /** Helps modify spec before adding to scheduler for adhoc flows */
+  /** Check that a spec should be scheduled and if it is, modify the spec of 
an adhoc flow before adding to scheduler*/
   private void addSpecHelperMethod(Spec spec) {
-    // Disable FLOW_RUN_IMMEDIATELY on service startup or leadership change if 
the property is set to true
-    if (spec instanceof FlowSpec && PropertiesUtils
-        .getPropAsBoolean(((FlowSpec) spec).getConfigAsProperties(), 
ConfigurationKeys.FLOW_RUN_IMMEDIATELY,
-            "false")) {
-      Spec modifiedSpec = disableFlowRunImmediatelyOnStart((FlowSpec) spec);
-      onAddSpec(modifiedSpec);
+    // Adhoc flows will not have any job schedule key, but we should schedule 
them
+    FlowSpec flowSpec = (FlowSpec) spec;
+    if (!flowSpec.getConfig().hasPath(ConfigurationKeys.JOB_SCHEDULE_KEY)
+        || 
isNextRunWithinRangeToSchedule(flowSpec.getConfig().getString(ConfigurationKeys.JOB_SCHEDULE_KEY),
+        this.thresholdToSkipSchedulingFlowsAfter)) {
+      // Disable FLOW_RUN_IMMEDIATELY on service startup or leadership change 
if the property is set to true
+      if (spec instanceof FlowSpec && 
PropertiesUtils.getPropAsBoolean(((FlowSpec) spec).getConfigAsProperties(),
+          ConfigurationKeys.FLOW_RUN_IMMEDIATELY, "false")) {
+        Spec modifiedSpec = disableFlowRunImmediatelyOnStart((FlowSpec) spec);
+        onAddSpec(modifiedSpec);
+      } else {
+        onAddSpec(spec);
+      }
     } else {
-      onAddSpec(spec);
+      _log.info("Not scheduling spec {} during startup as next job to schedule 
is outside of threshold.", spec);

Review Comment:
   that's true, we may want to turn this on if we notice a bug so i'll update 
it to a `debug` level log in case we see any missed flows





Issue Time Tracking
-------------------

    Worklog Id:     (was: 850141)
    Time Spent: 1.5h  (was: 1h 20m)

> Skip scheduling flows far into future
> -------------------------------------
>
>                 Key: GOBBLIN-1797
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1797
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The unschedule feature linked below sets a schedule to run Jan 1st of 2050 so 
> far in advance that it will "never run" 
> [https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#62]
>  but potentially there are over 100k of these flows so we are loading and 
> scheduling many unnecessary flows. On initialization we add a check that 
> verifies the next run of the flow is within a certain time frame (100 days by 
> default) and loads it into the scheduler if it is within that time frame. We 
> choose that default value under the assumption that we will redeploy GaaS at 
> least every 100 days and then if we approach a far out scheduled flow we will 
> load it into the Scheduler. However, in most cases uses schedule flows for 
> near future or immediately and those will all be scheduled. This PR also 
> renames metrics and adds helpful new ones. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to