[jira] [Work logged] (GOBBLIN-1783) Initialize scheduler with batch gets instead of individual get per flow

ASF GitHub Bot (Jira) Thu, 16 Feb 2023 14:47:13 -0800


     [ 
https://issues.apache.org/jira/browse/GOBBLIN-1783?focusedWorklogId=846012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-846012
 ]


ASF GitHub Bot logged work on GOBBLIN-1783:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Feb/23 22:46
            Start Date: 16/Feb/23 22:46
    Worklog Time Spent: 10m 
      Work Description: umustafi commented on code in PR #3640:
URL: https://github.com/apache/gobblin/pull/3640#discussion_r1109099163


##########
gobblin-runtime/src/test/java/org/apache/gobblin/runtime/spec_store/MysqlSpecStoreWithUpdateTest.java:
##########
@@ -325,19 +325,18 @@ public void testGetAllSpecPaginate() throws Exception {
     Assert.assertTrue(specs.contains(this.flowSpec4));
 
     // Return all flowSpecs of index [0, 2). Total of 3 flowSpecs, only return 
first two.
-    specs = this.specStore.getSpecs(0,2);
+    specs = this.specStore.getSpecsPaginated(0,2);
     Assert.assertEquals(specs.size(), 2);
     Assert.assertTrue(specs.contains(this.flowSpec1));
     Assert.assertTrue(specs.contains(this.flowSpec2));
     Assert.assertFalse(specs.contains(this.flowSpec4));
 
-    // Return all flowSpecs of index [0, 2). Total of 3 flowSpecs, only return 
first two.
-    // Check that functionality for not including a start value is the same as 
including start value of 0
-    specs = this.specStore.getSpecs(-1, 2);
+    // Return all flowSpecs from index 1 to 3 - 1. Total of 2 flowSpecs, only 
return second two.
+    specs = this.specStore.getSpecsPaginated(1, 2);

Review Comment:
   Added tests for all the edge cases of negative, 0 count/offset or a start 
offset past the length of the store. 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 846012)
    Time Spent: 1h 40m  (was: 1.5h)

> Initialize scheduler with batch gets instead of individual get per flow
> -----------------------------------------------------------------------
>
>                 Key: GOBBLIN-1783
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1783
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-service
>            Reporter: Urmi Mustafi
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We seek to improve initialization time of the JobScheduler upon restart or 
> new leadership change by batching the mysql queries to get flow specs. 
> Instead of making 1 mysql get call for each flow execution id, which scales 
> extremely poorly with number of flows, we should group them to reduce number 
> of calls and downtime.
> This implementation adds two new functions to the SpecStore interface, 
> getSortedSpecURIs and getBatchedSpecs, that we use to achieve the batching. 
> Because these two functionalities are generic enough to be used in derived 
> classes of the SpecStore we add them to the base class. Although this 
> requires any child classes to implement these functions, it allows any 
> consumer of the parent class SpecStore to use this functionality without 
> caring about the specific implementation of the SpecStore used (as 
> JobScheduler does). Additionally, the getBatchedSpecs requires an offset or 
> starting point to obtain the batches from so the consumer has to do some book 
> keeping of where in the paginated gets we are but this again separates the 
> functionality from the use case of the consumer. the entirety of the flow 
> catalog is too large to load into memory for the Scheduler, so we use this 
> batch functionality. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (GOBBLIN-1783) Initialize scheduler with batch gets instead of individual get per flow

Reply via email to