writer-jill commented on code in PR #13181:
URL: https://github.com/apache/druid/pull/13181#discussion_r1005736908
##########
docs/operations/rule-configuration.md:
##########
@@ -23,212 +23,242 @@ title: "Retaining or automatically dropping data"
-->
-In Apache Druid, Coordinator processes use rules to determine what data should
be loaded to or dropped from the cluster. Rules are used for data retention and
query execution, and are set via the [web console](./web-console.md).
+In Apache Druid, [Coordinator processes](../design/coordinator.md) use rules
to determine what data to retain or drop from the cluster.
-There are three types of rules, i.e., load rules, drop rules, and broadcast
rules. Load rules indicate how segments should be assigned to different
historical process tiers and how many replicas of a segment should exist in
each tier.
-Drop rules indicate when segments should be dropped entirely from the cluster.
Finally, broadcast rules indicate how segments of different datasources should
be co-located in Historical processes.
+There are three types of rules: [load](#load-rules), [drop](#drop-rules), and
[broadcast](#broadcast-rules). See the sections below for more information on
each type.
-The Coordinator loads a set of rules from the metadata storage. Rules may be
specific to a certain datasource and/or a
-default set of rules can be configured. Rules are read in order and hence the
ordering of rules is important. The
-Coordinator will cycle through all used segments and match each segment with
the first rule that applies. Each segment
-may only match a single rule.
+The Coordinator loads a set of rules from the metadata storage. You can
configure a default set of rules to apply to all data sources, and/or you can
set specific rules for specific data sources.
-Note: It is recommended that the web console is used to configure rules.
However, the Coordinator process does have HTTP endpoints to programmatically
configure rules.
+## Set a rule
+
+To set a default retention rule for all data sources, send a POST request to
the following API:
+
+`/druid/coordinator/v1/rules/_default`
+
+To set a retention rule for a specific data source, send a POST request to the
following API:
+
+`/druid/coordinator/v1/rules/{dataSourceName}`
+
+The rules API accepts a list of rules. The payload you send in the API request
for each rule is specific to the rules types outlined below.
+
+You can also set rules using the [web console](./web-console.md). Go into a
data source and select **Actions** > **Edit retention rules**.
+
+### Rule order
+
+The order of rules is important. The Coordinator reads rules in the order in
which they appear in the rules list. For example, in the following screenshot
the Coordinator evaluates data against rule 1, then rule 2, then rule 3:
+
+
+
+In the web console you can use the up and down arrows on the right side of the
interface to change the order of the rules.
+
+The Coordinator cycles through all used segments and matches each segment with
the first rule that applies. Each segment can only match a single rule.
## Load rules
-Load rules indicate how many replicas of a segment should exist in a server
tier. **Please note**: If a Load rule is used to retain only data from a
certain interval or period, it must be accompanied by a Drop rule. If a Drop
rule is not included, data not within the specified interval or period will be
retained by the default rule (loadForever).
+Load rules define how Druid assigns segments to historical process tiers, and
how many replicas of a segment exist in each tier.
+
+If you want to use a load rule to retain only data from a defined period of
time, you must also define a drop rule. If you don't define a drop rule, Druid
retains data that doesn't lie within your defined period according to the
default rule, `loadForever`.
-### Forever Load Rule
+### Forever load rule
-Forever load rules are of the form:
+Forever load rules have type `loadForever` and the following example API
payload:
```json
{
- "type" : "loadForever",
+ "type": "loadForever",
"tieredReplicants": {
"hot": 1,
- "_default_tier" : 1
+ "_default_tier": 1
}
}
```
+Set the following property:
+- `tieredReplicants`: a JSON object containing tier names and the number of
replicas for each tier.
-* `type` - this should always be "loadForever"
-* `tieredReplicants` - A JSON Object where the keys are the tier names and
values are the number of replicas for that tier.
+The forever load rule is the default rule Druid applies to data sources.
+### Interval load rule
-### Interval Load Rule
+Interval load rules have type `loadByInterval` and the following example API
payload:
-Interval load rules are of the form:
```json
{
- "type" : "loadByInterval",
+ "type": "loadByInterval",
"interval": "2012-01-01/2013-01-01",
"tieredReplicants": {
"hot": 1,
- "_default_tier" : 1
+ "_default_tier": 1
}
}
```
-* `type` - this should always be "loadByInterval"
-* `interval` - A JSON Object representing ISO-8601 Intervals
-* `tieredReplicants` - A JSON Object where the keys are the tier names and
values are the number of replicas for that tier.
+Set the following properties:
+- `interval`: a JSON object representing
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) intervals.
+- `tieredReplicants`: a JSON object containing tier names and the number of
replicas for each tier.
-### Period Load Rule
+### Period load rule
-Period load rules are of the form:
+Period load rules have type `loadByPeriod` and the following example API
payload:
```json
{
- "type" : "loadByPeriod",
- "period" : "P1M",
- "includeFuture" : true,
+ "type": "loadByPeriod",
+ "period": "P1M",
+ "includeFuture": true,
"tieredReplicants": {
"hot": 1,
- "_default_tier" : 1
+ "_default_tier": 1
}
}
```
-* `type` - this should always be "loadByPeriod"
-* `period` - A JSON Object representing ISO-8601 Periods
-* `includeFuture` - A JSON Boolean indicating whether the load period should
include the future. This property is optional, Default is true.
-* `tieredReplicants` - A JSON Object where the keys are the tier names and
values are the number of replicas for that tier.
+Set the following properties:
+- `period`: a JSON object representing
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) periods. The period is from
some time in the past to the future or to the current time, depending on the
`includeFuture` flag.
+- `includeFuture`: a boolean flag to indicate whether the load period includes
the future. Defaults to `true`.
+- `tieredReplicants`: a JSON object containing tier names and the number of
replicas for each tier.
-The interval of a segment will be compared against the specified period. The
period is from some time in the past to the future or to the current time,
which depends on `includeFuture` is true or false. The rule matches if the
period *overlaps* the interval.
+Druid compares a segment's interval to the period you specify in the rule. The
rule matches if the period overlaps the segment interval.
-## Drop Rules
+## Drop rules
-Drop rules indicate when segments should be dropped from the cluster.
+Drop rules define when Druid drops segments from the cluster.
-### Forever Drop Rule
+### Forever drop rule
-Forever drop rules are of the form:
+Forever drop rules have type `dropForever`:
```json
{
- "type" : "dropForever"
+ "type": "dropForever"
}
```
-* `type` - this should always be "dropForever"
+Druid drops all segments with this rule from the cluster.
-All segments that match this rule are dropped from the cluster.
+### Interval drop rule
-
-### Interval Drop Rule
-
-Interval drop rules are of the form:
+Interval drop rules have type `dropByInterval` and the following example API
payload:
```json
{
- "type" : "dropByInterval",
- "interval" : "2012-01-01/2013-01-01"
+ "type": "dropByInterval",
+ "interval": "2012-01-01/2013-01-01"
}
```
-* `type` - this should always be "dropByInterval"
-* `interval` - A JSON Object representing ISO-8601 Periods
+Set the following property:
+- `interval`: a JSON object representing
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) intervals.
-A segment is dropped if the interval contains the interval of the segment.
+Druid drops all segments that match the defined interval.
-### Period Drop Rule
+### Period drop rule
-Period drop rules are of the form:
+Period drop rules have type `dropByPeriod` and the following example API
payload:
```json
{
- "type" : "dropByPeriod",
- "period" : "P1M",
- "includeFuture" : true
+ "type": "dropByPeriod",
+ "period": "P1M",
+ "includeFuture": true
}
```
-* `type` - this should always be "dropByPeriod"
-* `period` - A JSON Object representing ISO-8601 Periods
-* `includeFuture` - A JSON Boolean indicating whether the load period should
include the future. This property is optional, Default is true.
+Set the following properties:
+- `period`: a JSON object representing
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) periods. The period is from
some time in the past to the future or to the current time, depending on the
`includeFuture` flag.
+- `includeFuture`: a boolean flag to indicate whether the drop period includes
the future. Defaults to `true`.
-The interval of a segment will be compared against the specified period. The
period is from some time in the past to the future or to the current time,
which depends on `includeFuture` is true or false. The rule matches if the
period *contains* the interval. This drop rule always dropping recent data.
+Druid compares a segment's interval to the period you specify in the rule. The
rule matches if the period contains the segment interval. This rule always
drops recent data.
-### Period Drop Before Rule
+### Period drop before rule
-Period drop before rules are of the form:
+Period drop rules have type `dropBeforeByPeriod` and the following example API
payload:
```json
{
- "type" : "dropBeforeByPeriod",
- "period" : "P1M"
+ "type": "dropBeforeByPeriod",
+ "period": "P1M"
}
```
-* `type` - this should always be "dropBeforeByPeriod"
-* `period` - A JSON Object representing ISO-8601 Periods
+Set the following property:
+- `period`: a JSON object representing
[ISO-8601](https://en.wikipedia.org/wiki/ISO_8601) periods.
+
+Druid compares a segment's interval to the period you specify in the rule. The
rule matches if the segment interval is before the specified period.
-The interval of a segment will be compared against the specified period. The
period is from some time in the past to the current time. The rule matches if
the interval before the period. If you just want to retain recent data, you can
use this rule to drop the old data that before a specified period and add a
`loadForever` rule to follow it. Notes, `dropBeforeByPeriod + loadForever` is
equivalent to `loadByPeriod(includeFuture = true) + dropForever`.
+If you only want to retain recent data, you can use this rule to drop old data
before a specified period, and add a `loadForever` rule to retain the data that
follows it. Note that the rule combination `dropBeforeByPeriod` + `loadForever`
is equivalent to `loadByPeriod(includeFuture = true)` + `dropForever`.
-## Broadcast Rules
+## Broadcast rules
-Broadcast rules indicate that segments of a data source should be loaded by
all servers of a cluster of the following types: historicals, brokers, tasks,
and indexers.
+Broadcast rules instruct Druid to load segments of a data source in all
brokers, historicals, tasks, and indexers in the cluster.
-Note that the broadcast segments are only directly queryable through the
historicals, but they are currently loaded on other server types to support
join queries.
+Note that the broadcast segments are only directly queryable through the
historicals, but Druid loads them on other server types to support join queries.
Review Comment:
Updated, although I can't find any document that talks about the index table
feature. Will ask Paul.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]