kroeders opened a new pull request #10129: URL: https://github.com/apache/druid/pull/10129
In a cluster with a large number of data sources, it is convenient to reuse retention rules between different data sources. The default rules on a cluster must apply to every data source in the cluster, so this is not sufficiently fine grained. It would be useful to provide a flexible mechanism to share rules between datasources. A new import rule could dynamically import rules from the other data source every time rules are applied. This can be implemented using the existing API with the addition of a new rule type that specifies an imported ruleset, as shown here.  ### Description This branch adds a new ImportRule type that specifies an imported rule set. Whenever rules are used, they are expanded to include the default rules for the cluster. This changes the getRulesWithDefault to also expand any import rules found at the time they are retrieved. This is functionally equivalent to the original list as the rules are replaced before evaluation. Rules are also used in CoordinatorRuleManager in the router, where getRulesWithDefault was implemented again. Here, that implementation is left at one place, in SQLRuleManager and a new method is added to RulesResource to allow CoordinatorRuleManager to retrieve the expanded list of rules without knowing how it was generated. Alternatives : Flexible Defaults - Allow for multiple default rule lists, instead of just one. Because of the prevalent assumption of ForeverLoad being the default rule, it would be complicated to remove that assumption everywhere. So, a second tier of defaults would be needed. Clone Rules - Rules from other data sources could be copied using existing APIs, but changes to the copied rules would not reflect. This is a problem for maintenance as the user will still need to review every data source on changes. Evaluation Import Rules - Alternative to modifying the rules list when expanded, it would be possible to do the import at rule evaluation time. This requires evaluating imports and breaking cycles for each segment as well as introducing more complex test cases regarding concurrent changes to retention rules. In the implementation, the function of getRulesWithDefault has diverged a little bit from the name, maybe a name like getExpandedRules would be better. Also, the implementation for ImportRule is specifically in SQLMetadataRuleManager maybe it should be interface methods or in a utility class for use by other implementations in the future. <hr> This PR has: - [X] been self-reviewed. - [X] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [X] been tested in a test Druid cluster. <hr> ##### Key changed/added classes in this PR * `SQLMetadataRuleManager` * `ImportRule` * `CoordinatorRuleManager` * `RulesResource` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
