[GitHub] [druid] writer-jill commented on a diff in pull request #13181: Update retention rules doc

GitBox Wed, 26 Oct 2022 07:10:39 -0700


writer-jill commented on code in PR #13181:
URL: https://github.com/apache/druid/pull/13181#discussion_r1005736908



##########
docs/operations/rule-configuration.md:
##########
@@ -23,212 +23,242 @@ title: "Retaining or automatically dropping data"
   -->
 
 
-In Apache Druid, Coordinator processes use rules to determine what data should 
be loaded to or dropped from the cluster. Rules are used for data retention and 
query execution, and are set via the [web console](./web-console.md).
+In Apache Druid, [Coordinator processes](../design/coordinator.md) use rules 
to determine what data to retain or drop from the cluster. 
 
-There are three types of rules, i.e., load rules, drop rules, and broadcast 
rules. Load rules indicate how segments should be assigned to different 
historical process tiers and how many replicas of a segment should exist in 
each tier.
-Drop rules indicate when segments should be dropped entirely from the cluster. 
Finally, broadcast rules indicate how segments of different datasources should 
be co-located in Historical processes.
+There are three types of rules: [load](#load-rules), [drop](#drop-rules), and 
[broadcast](#broadcast-rules). See the sections below for more information on 
each type.
 
-The Coordinator loads a set of rules from the metadata storage. Rules may be 
specific to a certain datasource and/or a
-default set of rules can be configured. Rules are read in order and hence the 
ordering of rules is important. The
-Coordinator will cycle through all used segments and match each segment with 
the first rule that applies. Each segment
-may only match a single rule.
+The Coordinator loads a set of rules from the metadata storage. You can 
configure a default set of rules to apply to all data sources, and/or you can 
set specific rules for specific data sources. 
 
-Note: It is recommended that the web console is used to configure rules. 
However, the Coordinator process does have HTTP endpoints to programmatically 
configure rules.
+## Set a rule
+
+To set a default retention rule for all data sources, send a POST request to 
the following API:
+
+`/druid/coordinator/v1/rules/_default`
+
+To set a retention rule for a specific data source, send a POST request to the 
following API:
+
+`/druid/coordinator/v1/rules/{dataSourceName}`
+
+The rules API accepts a list of rules. The payload you send in the API request 
for each rule is specific to the rules types outlined below.
+
+You can also set rules using the [web console](./web-console.md). Go into a 
data source and select **Actions** > **Edit retention rules**.
+
+### Rule order
+
+The order of rules is important. The Coordinator reads rules in the order in 
which they appear in the rules list. For example, in the following screenshot 
the Coordinator evaluates data against rule 1, then rule 2, then rule 3:
+
+![retention rules](../assets/retention-rules.png)
+
+In the web console you can use the up and down arrows on the right side of the 
interface to change the order of the rules.
+
+The Coordinator cycles through all used segments and matches each segment with 
the first rule that applies. Each segment can only match a single rule.
 
 ## Load rules
 
-Load rules indicate how many replicas of a segment should exist in a server 
tier. **Please note**: If a Load rule is used to retain only data from a 
certain interval or period, it must be accompanied by a Drop rule. If a Drop 
rule is not included, data not within the specified interval or period will be 
retained by the default rule (loadForever).
+Load rules define how Druid assigns segments to historical process tiers, and 
how many replicas of a segment exist in each tier.
+
+If you want to use a load rule to retain only data from a defined period of 
time, you must also define a drop rule. If you don't define a drop rule, Druid 
retains data that doesn't lie within your defined period according to the 
default rule, `loadForever`.
 
-### Forever Load Rule
+### Forever load rule
 
-Forever load rules are of the form:
+Forever load rules have type `loadForever` and the following example API 
payload:
 
 ```json
 {
-  "type" : "loadForever",
+  "type": "loadForever",
   "tieredReplicants": {
     "hot": 1,
-    "_default_tier" : 1
+    "_default_tier": 1
   }
 }
 ```
+Set the following property:
+- `tieredReplicants`: a JSON object containing tier names and the number of 
replicas for each tier.
 
-* `type` - this should always be "loadForever"
-* `tieredReplicants` - A JSON Object where the keys are the tier names and 
values are the number of replicas for that tier.
+The forever load rule is the default rule Druid applies to data sources.
 
+### Interval load rule
 
-### Interval Load Rule
+Interval load rules have type `loadByInterval` and the following example API 
payload:
 
-Interval load rules are of the form:
 
 ```json
 {
-  "type" : "loadByInterval",
+  "type": "loadByInterval",
   "interval": "2012-01-01/2013-01-01",
   "tieredReplicants": {
     "hot": 1,
-    "_default_tier" : 1
+    "_default_tier": 1
   }
 }
 ```
 
-* `type` - this should always be "loadByInterval"
-* `interval` - A JSON Object representing ISO-8601 Intervals
-* `tieredReplicants` - A JSON Object where the keys are the tier names and 
values are the number of replicas for that tier.
+Set the following properties:
+- `interval`: a JSON object representing 
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) intervals.
+- `tieredReplicants`: a JSON object containing tier names and the number of 
replicas for each tier.
 
-### Period Load Rule
+### Period load rule
 
-Period load rules are of the form:
+Period load rules have type `loadByPeriod` and the following example API 
payload:
 
 ```json
 {
-  "type" : "loadByPeriod",
-  "period" : "P1M",
-  "includeFuture" : true,
+  "type": "loadByPeriod",
+  "period": "P1M",
+  "includeFuture": true,
   "tieredReplicants": {
       "hot": 1,
-      "_default_tier" : 1
+      "_default_tier": 1
   }
 }
 ```
 
-* `type` - this should always be "loadByPeriod"
-* `period` - A JSON Object representing ISO-8601 Periods
-* `includeFuture` - A JSON Boolean indicating whether the load period should 
include the future. This property is optional, Default is true.
-* `tieredReplicants` - A JSON Object where the keys are the tier names and 
values are the number of replicas for that tier.
+Set the following properties:
+- `period`: a JSON object representing 
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) periods. The period is from 
some time in the past to the future or to the current time, depending on the 
`includeFuture` flag.
+- `includeFuture`: a boolean flag to indicate whether the load period includes 
the future. Defaults to `true`.
+- `tieredReplicants`: a JSON object containing tier names and the number of 
replicas for each tier.
 
-The interval of a segment will be compared against the specified period. The 
period is from some time in the past to the future or to the current time, 
which depends on `includeFuture` is true or false. The rule matches if the 
period *overlaps* the interval.
+Druid compares a segment's interval to the period you specify in the rule. The 
rule matches if the period overlaps the segment interval. 
 
-## Drop Rules
+## Drop rules
 
-Drop rules indicate when segments should be dropped from the cluster.
+Drop rules define when Druid drops segments from the cluster.
 
-### Forever Drop Rule
+### Forever drop rule
 
-Forever drop rules are of the form:
+Forever drop rules have type `dropForever`:
 
 ```json
 {
-  "type" : "dropForever"
+  "type": "dropForever"
 }
 ```
 
-* `type` - this should always be "dropForever"
+Druid drops all segments with this rule from the cluster. 
 
-All segments that match this rule are dropped from the cluster.
+### Interval drop rule
 
-
-### Interval Drop Rule
-
-Interval drop rules are of the form:
+Interval drop rules have type `dropByInterval` and the following example API 
payload:
 
 ```json
 {
-  "type" : "dropByInterval",
-  "interval" : "2012-01-01/2013-01-01"
+  "type": "dropByInterval",
+  "interval": "2012-01-01/2013-01-01"
 }
 ```
 
-* `type` - this should always be "dropByInterval"
-* `interval` - A JSON Object representing ISO-8601 Periods
+Set the following property:
+- `interval`: a JSON object representing 
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) intervals.
 
-A segment is dropped if the interval contains the interval of the segment.
+Druid drops all segments that match the defined interval.
 
-### Period Drop Rule
+### Period drop rule
 
-Period drop rules are of the form:
+Period drop rules have type `dropByPeriod` and the following example API 
payload:
 
 ```json
 {
-  "type" : "dropByPeriod",
-  "period" : "P1M",
-  "includeFuture" : true
+  "type": "dropByPeriod",
+  "period": "P1M",
+  "includeFuture": true
 }
 ```
 
-* `type` - this should always be "dropByPeriod"
-* `period` - A JSON Object representing ISO-8601 Periods
-* `includeFuture` - A JSON Boolean indicating whether the load period should 
include the future. This property is optional, Default is true.
+Set the following properties:
+- `period`: a JSON object representing 
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) periods. The period is from 
some time in the past to the future or to the current time, depending on the 
`includeFuture` flag.
+- `includeFuture`: a boolean flag to indicate whether the drop period includes 
the future. Defaults to `true`.
 
-The interval of a segment will be compared against the specified period. The 
period is from some time in the past to the future or to the current time, 
which depends on `includeFuture` is true or false. The rule matches if the 
period *contains* the interval. This drop rule always dropping recent data.
+Druid compares a segment's interval to the period you specify in the rule. The 
rule matches if the period contains the segment interval. This rule always 
drops recent data.
 
-### Period Drop Before Rule
+### Period drop before rule
 
-Period drop before rules are of the form:
+Period drop rules have type `dropBeforeByPeriod` and the following example API 
payload:
 
 ```json
 {
-  "type" : "dropBeforeByPeriod",
-  "period" : "P1M"
+  "type": "dropBeforeByPeriod",
+  "period": "P1M"
 }
 ```
 
-* `type` - this should always be "dropBeforeByPeriod"
-* `period` - A JSON Object representing ISO-8601 Periods
+Set the following property:
+- `period`: a JSON object representing 
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) periods.
+
+Druid compares a segment's interval to the period you specify in the rule. The 
rule matches if the segment interval is before the specified period. 
 
-The interval of a segment will be compared against the specified period. The 
period is from some time in the past to the current time. The rule matches if 
the interval before the period. If you just want to retain recent data, you can 
use this rule to drop the old data that before a specified period and add a 
`loadForever` rule to follow it. Notes, `dropBeforeByPeriod + loadForever` is 
equivalent to `loadByPeriod(includeFuture = true) + dropForever`.
+If you only want to retain recent data, you can use this rule to drop old data 
before a specified period, and add a `loadForever` rule to retain the data that 
follows it. Note that the rule combination `dropBeforeByPeriod` + `loadForever` 
is equivalent to `loadByPeriod(includeFuture = true)` + `dropForever`.
 
-## Broadcast Rules
+## Broadcast rules
 
-Broadcast rules indicate that segments of a data source should be loaded by 
all servers of a cluster of the following types: historicals, brokers, tasks, 
and indexers.
+Broadcast rules instruct Druid to load segments of a data source in all 
brokers, historicals, tasks, and indexers in the cluster.
 
-Note that the broadcast segments are only directly queryable through the 
historicals, but they are currently loaded on other server types to support 
join queries.
+Note that the broadcast segments are only directly queryable through the 
historicals, but Druid loads them on other server types to support join queries.

Review Comment:
   Updated, although I can't find any document that talks about the index table 
feature. Will ask Paul.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] writer-jill commented on a diff in pull request #13181: Update retention rules doc

Reply via email to