[ 
https://issues.apache.org/jira/browse/GOBBLIN-1797?focusedWorklogId=849922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-849922
 ]

ASF GitHub Bot logged work on GOBBLIN-1797:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Mar/23 22:28
            Start Date: 08/Mar/23 22:28
    Worklog Time Spent: 10m 
      Work Description: umustafi commented on code in PR #3656:
URL: https://github.com/apache/gobblin/pull/3656#discussion_r1130141949


##########
gobblin-api/src/main/java/org/apache/gobblin/configuration/ConfigurationKeys.java:
##########
@@ -90,7 +90,9 @@ public class ConfigurationKeys {
   public static final String JOB_RETRIGGERING_ENABLED = 
"job.retriggering.enabled";
   public static final String DEFAULT_JOB_RETRIGGERING_ENABLED = "true";
   public static final String LOAD_SPEC_BATCH_SIZE = "load.spec.batch.size";
-  public static final int DEFAULT_LOAD_SPEC_BATCH_SIZE = 100;
+  public static final int DEFAULT_LOAD_SPEC_BATCH_SIZE = 500;
+  public static final String SKIP_SCHEDULING_FLOWS_AFTER_NUM_DAYS = 
"skip.scheduling.flows.after.num.days";
+  public static final int DEFAULT_NUM_DAYS_TO_SKIP_AFTER = 100;

Review Comment:
   In reality there's almost no case people will schedule in advance of more 
than a month out, maybe some monthly flow or weekly but more likely it's daily. 
Majority of these flows should actually be either years out (2050) or well 
within this time frame. What do you think about that?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 849922)
    Time Spent: 40m  (was: 0.5h)

> Skip scheduling flows far into future
> -------------------------------------
>
>                 Key: GOBBLIN-1797
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1797
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The unschedule feature linked below sets a schedule to run Jan 1st of 2050 so 
> far in advance that it will "never run" 
> [https://jarvis.corp.linkedin.com/codesearch/result/?name=FlowConfigResourceLocalHandler.java&path=gobblin-elr%2Fgobblin-restli%2Fgobblin-flow-config-service%2Fgobblin-flow-config-service-server%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgobblin%2Fservice&reponame=linkedin%2Fgobblin-elr#62]
>  but potentially there are over 100k of these flows so we are loading and 
> scheduling many unnecessary flows. On initialization we add a check that 
> verifies the next run of the flow is within a certain time frame (100 days by 
> default) and loads it into the scheduler if it is within that time frame. We 
> choose that default value under the assumption that we will redeploy GaaS at 
> least every 100 days and then if we approach a far out scheduled flow we will 
> load it into the Scheduler. However, in most cases uses schedule flows for 
> near future or immediately and those will all be scheduled. This PR also 
> renames metrics and adds helpful new ones. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to