Urmi Mustafi created GOBBLIN-1783:
-------------------------------------
Summary: Initialize scheduler with batch gets instead of
individual get per flow
Key: GOBBLIN-1783
URL: https://issues.apache.org/jira/browse/GOBBLIN-1783
Project: Apache Gobblin
Issue Type: Bug
Components: gobblin-service
Reporter: Urmi Mustafi
Assignee: Abhishek Tiwari
We seek to improve initialization time of the JobScheduler upon restart or new
leadership change by batching the mysql queries to get flow specs. Instead of
making 1 mysql get call for each flow execution id, which scales extremely
poorly with number of flows, we should group them to reduce number of calls and
downtime.
This implementation adds two new functions to the SpecStore interface,
getSortedSpecs and getBatchedSpecs, that we use to achieve the batching.
Because these two functionalities are generic enough to be used in derived
classes of the SpecStore we add them to the base class. Although this requires
any child classes to implement these functions, it allows any consumer of the
parent class SpecStore to use this functionality without caring about the
specific implementation of the SpecStore used (as JobScheduler does).
Additionally, the getBatchedSpecs requires an offset or starting point to
obtain the batches from so the consumer has to do some book keeping of where in
the paginated gets we are but this again separates the functionality from the
use case of the consumer. the entirety of the flow catalog is too large to load
into memory for the Scheduler, so we use this batch functionality.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)