Urmi Mustafi created GOBBLIN-1783:
-------------------------------------

             Summary: Initialize scheduler with batch gets instead of 
individual get per flow
                 Key: GOBBLIN-1783
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1783
             Project: Apache Gobblin
          Issue Type: Bug
          Components: gobblin-service
            Reporter: Urmi Mustafi
            Assignee: Abhishek Tiwari


We seek to improve initialization time of the JobScheduler upon restart or new 
leadership change by batching the mysql queries to get flow specs. Instead of 
making 1 mysql get call for each flow execution id, which scales extremely 
poorly with number of flows, we should group them to reduce number of calls and 
downtime.

This implementation adds two new functions to the SpecStore interface, 
getSortedSpecs and getBatchedSpecs, that we use to achieve the batching. 
Because these two functionalities are generic enough to be used in derived 
classes of the SpecStore we add them to the base class. Although this requires 
any child classes to implement these functions, it allows any consumer of the 
parent class SpecStore to use this functionality without caring about the 
specific implementation of the SpecStore used (as JobScheduler does). 
Additionally, the getBatchedSpecs requires an offset or starting point to 
obtain the batches from so the consumer has to do some book keeping of where in 
the paginated gets we are but this again separates the functionality from the 
use case of the consumer. the entirety of the flow catalog is too large to load 
into memory for the Scheduler, so we use this batch functionality. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to