kroeders opened a new pull request #10129:
URL: https://github.com/apache/druid/pull/10129


   In a cluster with a large number of data sources, it is convenient to reuse 
retention rules between different data sources. The default rules on a cluster 
must apply to every data source in the cluster, so this is not sufficiently 
fine grained. It would be useful to provide a flexible mechanism to share rules 
between datasources.  A new import rule could dynamically import rules from the 
other data source every time rules are applied. This can be implemented using 
the existing API with the addition of a new rule type that specifies an 
imported ruleset, as shown here. 
   
   
![sizecheck](https://user-images.githubusercontent.com/8482587/86393561-3ed6ab80-bc6b-11ea-8521-663126be0475.gif)
   
   ### Description
   
   This branch adds a new ImportRule type that specifies an imported rule set. 
Whenever rules are used, they are expanded to include the default rules for the 
cluster. This changes the getRulesWithDefault to also expand any import rules 
found at the time they are retrieved. This is functionally equivalent to the 
original list as the rules are replaced before evaluation. Rules are also used 
in CoordinatorRuleManager in the router, where getRulesWithDefault was 
implemented again. Here, that implementation is left at one place, in 
SQLRuleManager and a new method is added to RulesResource to allow 
CoordinatorRuleManager to retrieve the expanded list of rules without knowing 
how it was generated. 
   
   Alternatives : 
   
        Flexible Defaults - Allow for multiple default rule lists, instead of 
just one. Because of the prevalent assumption of ForeverLoad being the default 
rule, it would be complicated to remove that assumption everywhere. So, a 
second tier of defaults would be needed. 
   
        Clone Rules - Rules from other data sources could be copied using 
existing APIs, but changes to the copied rules would not reflect. This is a 
problem for maintenance as the user will still need to review every data source 
on changes. 
   
        Evaluation Import Rules - Alternative to modifying the rules list when 
expanded, it would be possible to do the import at rule evaluation time. This 
requires evaluating imports and breaking cycles for each segment as well as 
introducing more complex test cases regarding concurrent changes to retention 
rules. 
   
          In the implementation, the function of getRulesWithDefault has 
diverged a little bit from the name, maybe a name like getExpandedRules would 
be better. Also, the implementation for ImportRule is specifically in 
SQLMetadataRuleManager maybe it should be interface methods or in a utility 
class for use by other implementations in the future. 
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
   - [X] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `SQLMetadataRuleManager`
    * `ImportRule`
    * `CoordinatorRuleManager`
    * `RulesResource`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to