kfaraz opened a new issue, #12822:
URL: https://github.com/apache/druid/issues/12822

   ## Motivation
   
   This proposal is inspired by the original thoughts in 
https://github.com/apache/druid/issues/9087 
   
   From a point of view of testing, segment balancing poses the following 
challenges:
   - In practice, balancing is a slow process and issues (or their resolution) 
may take several hours
     or even days to manifest themselves.
   - A bug may occur only in a very specific cluster setup, thus making it 
difficult to reproduce.
   - The level of confidence in any change made to the balancing logic or a 
strategy is low as
      there are no ITs around this and the unit tests verify only the very 
basic behaviour.
   - Owing to the large number of moving parts and underlying async execution, 
balancing is prone to
      erroneous behaviour resulting from race conditions. These race conditions 
are difficult to discover
      using typical unit tests.
      
   We can begin to address these concerns with a framework that allows users to 
simulate typical
   segment balancing scenarios with ease, preferably in a low duty environment, 
such as a unit test
   or an integration test. Such a framework can also help identify performance 
bottlenecks and
   potential bugs in the current system and even compare different balancing 
strategies.
   
   ## Possible approaches
   
   Given the requirements, we could choose any one of the following setups for 
a simulator:
   1. A running service
      - Pros:
        - Expose APIs to specify inputs
        - Support live visualizations on a browser
      - Cons:
        - Large amount of manual intervention.
        - No method to verify the results of a run.
        - No way to save input parameters of an adhoc test run (DB is not an 
option)
        - The only real value-add compared to other options is visualization 
which would be overkill for the task at hand.
   2. An integration test framework
      - Pros:
         - Closely resemble a production setup
         - Live interaction between all of Druid's moving parts
      - Con: The fact that it would closely resemble a production setup is what 
makes this a bad candidate as
            it would suffer from the same reproduction challenges.
         - difficult to recreate scenarios which involve a large number of 
servers or segments
         - not possible to verify the effect of multiple coordinator runs in a 
short span of time
         - resource-intensive
   3. A unit test framework
      - Pros:
         - Great degree of control
         - Easy to add a new combination of input parameters as a new test case
         - No manual intervention required for verification
         - Easy to recreate even the most elaborate of cluster setups and 
actions on the fly
         - The underlying framework can be extended to power visualizations if 
needed
      - Cons:
         - Not a perfect representation of the production environment (a vice 
allowed to all tests)
   
   ## Proposed changes
   
   As seen above, a unit-test framework would be the ideal candidate for these 
simulations.
   The framework should be able to:
   - recreate varied cluster setups
   - run simulations that can cycle through a large number of coordinator runs 
in a short amount of time
   - test the interaction of the main coordinator pieces involved
   - take the actions listed below at pre-determined points during the course 
of the simulation
   - verify results of the simulation
   
   ### Programmer inputs
   
   The framework should give control over the following aspects of the setup:
   
   Input | Details | Actions
   -|-|-
   cluster | server name, type, tier, size | add a server, remove a server
   segment | datasource, interval, version, partition num, size | add/remove 
from server, mark used/unused, publish new segments
   rules | type (foreverLoad, drop, etc), replica count per tier | add a rule 
for a datasource, add default rule
   configs | coordinator period, load queue type, load queue size, max segments 
to balance, and a bunch of other configs | set or update a config
   
   ### Basic setup
   
   The following classes are the objects under test and must not be mocked:
   - `DruidCoordinator`
   - `LoadQueuePeon`
   - various coordinator duties: `BalanceSegments`, `RunRules`, 
`UnloadUnusedSegments`, etc.
   
   The following behaviour needs to be mocked:
   - loading of a segment on a server
   - interactions with metadata store
   
   Since some behaviour is mocked, we might miss out on some actual production 
scenarios
   but the mock entities can always be augmented to account for it in specific 
test cases, say
   by adding a delay before completing a segment load or failing a metadata 
store operation.
   
   ### Interim actions
   
   A custom coordinator duty can be used to invoke the specified actions at the 
end of every coordinator run.
   
   ```java
   // ActionRunnableCoordinatorDuty
   
   @Override
   public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams 
params) {
       final int currentRun = runCount.incrementAndGet();
   
       if (actions.containsKey(currentRun)) {
          actions.get(currentRun).invoke();
       }
   }
   ```
   
   The actions for a particular simulation could be specified like this:
   
   ```java
   
   ActionRunnableCoordinatorDuty actionsDuty = ActionRunnableCoordinatorDuty
       .createDutyWithActions()
       .afterEveryRun(runNumber -> {collectStats(runNumber); stopIfBalanced();})
       .afterRun(10, () -> killHistorical("historicalXyz"))
       .afterRun(20, () -> addHistoricalTier("reporting_tier"))
       .afterRun(30, () -> updateLoadRules(...))
       .afterRun(50, () -> completeCoordinatorRun(...))
       .build();
       
   ```
   
   
   ### Typical test case
   
   ```java
   
   @Test
   public void testBalancingThenTierShift() {
   
      // Initial rule, 2 replicas on _default_tier
      List<Rule> initialRules = Collections.singletonList(
         new ForeverLoadRule(Collections.singletonMap("_default_tier", 2))
      );
      
      // Updated rule, 3 replicas on reporting_tier
      List<Rule> tierShiftRules = Collections.singletonList(
         new ForeverLoadRule(Collections.singletonMap("reporting_tier", 3))
      );   
   
      // Balance segments first across "_default_tier" and then shifted to 
"reporting_tier"
      ActionRunnableCoordinatorDuty actionsDuty = ActionRunnableCoordinatorDuty
            .createDutyWithActions()
            .afterRun(20, () -> metadataRuleManager.overrideRule("wikitest", 
tierShiftRules, null))
            .afterRun(50, () -> completeCoordinatorRun(...))
            .build();
      
      // Create segments with a bit of fluid syntax sugar (or syntax syrup if 
you will)
      List<DataSegment> segments =
                 createSegmentsForDatasource("wikitest")
                            .overInterval("2022-01-01/2022-03-01")
                            .withGranularity(Granularities.DAY)
                            .andPartitions(10)
                            .eachOfSizeInMb(500);
                            
      // Create servers
      List<DruidServer> allServers = 
                 createHistoricals(
                        
createTier("_default_tier").withServers(5).eachOfSizeInGb(100)
                        
createTier("reporting_tier").withServers(3).eachOfSizeInGb(50)
                 );
      
      // Build and run the simulation                      
      buildSimulation()
             .withActionsDuty(actionsDuty)
             .withRules(initialRules)
             .withUsedSegments(segments)
             .withServers(allServers)
             .withCoordinatorPeriod("PT1s")
             .run();
         
      assertThatDatasourceIsFullyLoaded("wikitest");
      assertThatClusterIsBalanced();
   }
   ```
   
   ### Work status
   
   I am currently working on a PR which the above features except
   - assertion of balanced state
   - starting a simulation with a given segment distribution without having to 
wait for initial balancing
   
   I have also been able to discover some race conditions thanks to this 
framework and
   intend to create subsequent PRs for those.
   
   ## Future work
   
   ### Measure of balance
   
   In order to verify the success of a simulated run, we need to define some 
criteria of success.
   The motivation to balance segments and the major underlying strategy has 
been discussed
   at length [here](https://github.com/apache/druid/pull/2972).
   
   Taking this as our starting point, our measures of success should be that:
   - segments are evenly distributed across servers
   - time-adjacent segments are not co-located
   - the system is able to achieve this state as quickly as possible
   
   These parameters would need to be quantified and measured at the end of 
every run to get a clear sense
   of the balancing progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to