[GitHub] [druid] writer-jill commented on a diff in pull request #13181: Update retention rules doc

GitBox Fri, 28 Oct 2022 09:27:27 -0700


writer-jill commented on code in PR #13181:
URL: https://github.com/apache/druid/pull/13181#discussion_r1008247507



##########
docs/operations/rule-configuration.md:
##########
@@ -22,213 +22,304 @@ title: "Retaining or automatically dropping data"
   ~ under the License.
   -->
 
+Data retention rules allow you to configure Apache Druid to conform to your 
data retention policies. Your data retention policies specify which data to 
retain and which data to drop from the cluster.
 
-In Apache Druid, Coordinator processes use rules to determine what data should 
be loaded to or dropped from the cluster. Rules are used for data retention and 
query execution, and are set via the [web console](./web-console.md).
+Druid supports [load](#load-rules), [drop](#drop-rules), and 
[broadcast](#broadcast-rules) rules. Each rule is a JSON object. See the [rule 
definitions below](#load-rules) for details.
 
-There are three types of rules, i.e., load rules, drop rules, and broadcast 
rules. Load rules indicate how segments should be assigned to different 
historical process tiers and how many replicas of a segment should exist in 
each tier.
-Drop rules indicate when segments should be dropped entirely from the cluster. 
Finally, broadcast rules indicate how segments of different datasources should 
be co-located in Historical processes.
+You can configure a default set of rules to apply to all datasources, and/or 
you can set specific rules for specific datasources. See [rule 
structure](#rule-structure) to see how rule order impacts the way the 
Coordinator applies retention rules.
 
-The Coordinator loads a set of rules from the metadata storage. Rules may be 
specific to a certain datasource and/or a
-default set of rules can be configured. Rules are read in order and hence the 
ordering of rules is important. The
-Coordinator will cycle through all used segments and match each segment with 
the first rule that applies. Each segment
-may only match a single rule.
+You can specify the data to retain or drop in the following ways:
 
-Note: It is recommended that the web console is used to configure rules. 
However, the Coordinator process does have HTTP endpoints to programmatically 
configure rules.
+- Forever: all data in the segment.
+- Period: segment data specified as an offset from the present time.
+- Interval: a fixed time range.
+
+Retention rules are persistent: they remain in effect until you change them. 
Druid stores retention rules in its [metadata 
store](../dependencies/metadata-storage.md).
+
+## Set retention rules
+
+You can use the Druid [web console](./web-console.md) or the [Coordinator 
API](./api-reference.md#coordinator) to create and manage retention rules.
+
+### Use the web console
+
+To set retention rules in the Druid web console:
+
+1. On the console home page, click **Datasources**. 
+2. Click the name of your datasource to open the data window.
+3. Select **Actions > Edit retention rules**.
+4. Click **+New rule**.
+5. Select a rule type and set properties according to the [rules reference]().
+6. Click **Next** and enter a description for the rule.
+7. Click **Save** to save and apply the rule to the datasource.
+
+### Use the Coordinator API
+
+To set one or more default retention rules for all datasources, send a POST 
request containing a JSON object for each rule to 
`/druid/coordinator/v1/rules/_default`.
+
+The following example request sets a default forever broadcast rule for all 
datasources:
+
+```bash
+curl --location --request POST 
'http://localhost:8888/druid/coordinator/v1/rules/_default' \
+--header 'Content-Type: application/json' \
+--data-raw '[{
+  "type": "broadcastForever"
+  }]'
+```
+
+To set one or more retention rules for a specific datasource, send a POST 
request containing a JSON object for each rule to 
`/druid/coordinator/v1/rules/{datasourceName}`.
+
+The following example request sets a period drop rule and a period broadcast 
rule for the `wikipedia` datasource:
+
+```bash
+curl --location --request POST 
'http://localhost:8888/druid/coordinator/v1/rules/wikipedia' \
+--header 'Content-Type: application/json' \
+--data-raw '[{
+   "type": "dropByPeriod",
+   "period": "P1M",
+   "includeFuture": true
+   },
+   {
+    "type": "broadcastByPeriod",
+    "period": "P1M",
+    "includeFuture": true
+   }]'
+```
+To retrieve all rules for all datasources, send a GET request to 
`/druid/coordinator/v1/rules`&mdash;for example:
+
+```bash
+curl --location --request GET 
'http://localhost:8888/druid/coordinator/v1/rules'
+```
+
+### Rule structure
+
+The rules API accepts an array of rules as JSON objects. The JSON object you 
send in the API request for each rule is specific to the rules types outlined 
below.
+
+> You must pass the entire array of rules, in your desired order, with each 
API request. Each POST request to the rules API overwrites the existing rules 
for the specified datasource.
+
+The order of rules is very important. The Coordinator reads rules in the order 
in which they appear in the rules list. For example, in the following 
screenshot the Coordinator evaluates data against rule 1, then rule 2, then 
rule 3:
+
+![retention rules](../assets/retention-rules.png)
+
+The Coordinator cycles through all used segments and matches each segment with 
the first rule that applies. Each segment can only match a single rule.
+
+In the web console you can use the up and down arrows on the right side of the 
interface to change the order of the rules.
 
 ## Load rules
 
-Load rules indicate how many replicas of a segment should exist in a server 
tier. **Please note**: If a Load rule is used to retain only data from a 
certain interval or period, it must be accompanied by a Drop rule. If a Drop 
rule is not included, data not within the specified interval or period will be 
retained by the default rule (loadForever).
+Load rules define how Druid assigns segments to [historical process 
tiers](./mixed-workloads.md#historical-tiering), and how many replicas of a 
segment exist in each tier.
+
+If you have a single tier, Druid automatically names the tier `_default` and 
loads all segments onto it. If you define an additional tier, you must define a 
load rule to specify which segments to load on that tier. Until you define a 
load rule, your new tier remains empty.
 
-### Forever Load Rule
+### Forever load rule
 
-Forever load rules are of the form:
+The forever load rule assigns segment data to specified tiers. It is the 
default rule Druid applies to datasources. Forever load rules have type 
`loadForever`. 

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] writer-jill commented on a diff in pull request #13181: Update retention rules doc

Reply via email to