[ 
https://issues.apache.org/jira/browse/GOBBLIN-1797?focusedWorklogId=849904&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-849904
 ]

ASF GitHub Bot logged work on GOBBLIN-1797:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Mar/23 20:00
            Start Date: 08/Mar/23 20:00
    Worklog Time Spent: 10m 
      Work Description: ZihanLi58 commented on code in PR #3656:
URL: https://github.com/apache/gobblin/pull/3656#discussion_r1129951496


##########
gobblin-api/src/main/java/org/apache/gobblin/configuration/ConfigurationKeys.java:
##########
@@ -90,7 +90,9 @@ public class ConfigurationKeys {
   public static final String JOB_RETRIGGERING_ENABLED = 
"job.retriggering.enabled";
   public static final String DEFAULT_JOB_RETRIGGERING_ENABLED = "true";
   public static final String LOAD_SPEC_BATCH_SIZE = "load.spec.batch.size";
-  public static final int DEFAULT_LOAD_SPEC_BATCH_SIZE = 100;
+  public static final int DEFAULT_LOAD_SPEC_BATCH_SIZE = 500;
+  public static final String SKIP_SCHEDULING_FLOWS_AFTER_NUM_DAYS = 
"skip.scheduling.flows.after.num.days";
+  public static final int DEFAULT_NUM_DAYS_TO_SKIP_AFTER = 100;

Review Comment:
   Given one year to make it safer? 



##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/scheduler/GobblinServiceJobScheduler.java:
##########
@@ -290,19 +345,30 @@ private void scheduleSpecsFromCatalog() {
 
       while (batchOfSpecsIterator.hasNext()) {
         Spec spec = batchOfSpecsIterator.next();
-        try {
-          addSpecHelperMethod(spec);
-          urisLeftToSchedule.remove(spec.getUri());
-        } catch (Exception e) {
-          // If there is an uncaught error thrown during compilation, log it 
and continue adding flows
-          _log.error("Could not schedule spec {} from flowCatalog due to ", 
spec, e);
+        FlowSpec flowSpec = (FlowSpec) spec;
+        String cronExpression = 
flowSpec.getConfig().getString(ConfigurationKeys.JOB_SCHEDULE_KEY);
+        // Empty string cron expressions should be scheduled by default
+        if (isNextRunWithinRangeToSchedule(cronExpression, 
this.thresholdToSkipSchedulingFlowsAfter)) {

Review Comment:
   Do you want to remove it from urisLeftToSchedule even if it's not scheduled 
because of the junk flowspec?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 849904)
    Time Spent: 0.5h  (was: 20m)

> Skip scheduling flows far into future
> -------------------------------------
>
>                 Key: GOBBLIN-1797
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1797
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The unschedule feature linked below sets a schedule to run Jan 1st of 2050 so 
> far in advance that it will "never run" 
> [https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#62]
>  but potentially there are over 100k of these flows so we are loading and 
> scheduling many unnecessary flows. On initialization we add a check that 
> verifies the next run of the flow is within a certain time frame (100 days by 
> default) and loads it into the scheduler if it is within that time frame. We 
> choose that default value under the assumption that we will redeploy GaaS at 
> least every 100 days and then if we approach a far out scheduled flow we will 
> load it into the Scheduler. However, in most cases uses schedule flows for 
> near future or immediately and those will all be scheduled. This PR also 
> renames metrics and adds helpful new ones. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to