[GitHub] [druid] kroeders opened a new issue #10237: Increase reusability of retention rule configurations

GitBox Tue, 04 Aug 2020 08:43:48 -0700


kroeders opened a new issue #10237:
URL: https://github.com/apache/druid/issues/10237



   ### Motivation
   
   We have a deployment with a number of different “types” of data source, each 
with its own approach to retention rules. For example, one group of servers 
could be configured to load a recent queryable dataset whereas another group of 
servers is configured to keep an archive based on policies for when data can be 
deleted. Adding a new data source of a known type involves manually copying the 
retention rules from an existing data source, which is time consuming and error 
prone. Similarly, changing the rules across an entire type of data source (for 
example, advancing a time period on a new quarter) means editing each data 
source individually. How can retention rule configurations be reused to 
streamline this maintenance?
   
   ### Description
   
   Import Rules Rule Type - The proposal is to add an import rules type, that 
includes another ruleset dynamically at runtime. Effectively, this generalizes 
the default rules approach, but allows the user to  import other rules anywhere 
in the rule chain. The imported rules could be from another data source or from 
an independent ruleset similar to the existing default rules.  A UI could be 
provided to edit synthetic rulesets without changing any APIs. Similarly, the 
UI could restrict users to only import rules from synthetic rulesets if 
desired. 
   
   Pull Request [#10129](https://github.com/apache/druid/pull/10129) has an 
implementation with some UI work done
   
   ### Alternatives
   
   Datasource Specific Defaults - Allow for multiple default rule lists, 
instead of just one. There would be a database change to store the alternative 
default rules for a datasource. Even if a convention is used (e.g. 
<datasource>__default) some method for referring to other rulesets is needed. 
   
   Normalized Rule Groups - Each datasource stores a copy of rules, this could 
be broken out to a normalized rule table and linked to the data source. This 
would be a useful refactoring regardless and would allow rules to be reused. In 
order to provide rule reuse and per datasource customization, some additional 
reference would still be needed. 
   
   Clone Rules - Rules from other data sources could be copied using existing 
APIs, but changes to the copied rules would not reflect. This is a partial 
solution as the user will still need to review every data source on changes. 
   
   Evaluation Import Rules - Alternative to modifying the rules list when 
expanded, it would be possible to do the import at rule evaluation time. This 
requires evaluating imports and breaking cycles for each segment as well as 
introducing more complex test cases regarding concurrent changes to retention 
rules.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kroeders opened a new issue #10237: Increase reusability of retention rule configurations

Reply via email to