writer-jill commented on code in PR #13181: URL: https://github.com/apache/druid/pull/13181#discussion_r1008247507
########## docs/operations/rule-configuration.md: ########## @@ -22,213 +22,304 @@ title: "Retaining or automatically dropping data" ~ under the License. --> +Data retention rules allow you to configure Apache Druid to conform to your data retention policies. Your data retention policies specify which data to retain and which data to drop from the cluster. -In Apache Druid, Coordinator processes use rules to determine what data should be loaded to or dropped from the cluster. Rules are used for data retention and query execution, and are set via the [web console](./web-console.md). +Druid supports [load](#load-rules), [drop](#drop-rules), and [broadcast](#broadcast-rules) rules. Each rule is a JSON object. See the [rule definitions below](#load-rules) for details. -There are three types of rules, i.e., load rules, drop rules, and broadcast rules. Load rules indicate how segments should be assigned to different historical process tiers and how many replicas of a segment should exist in each tier. -Drop rules indicate when segments should be dropped entirely from the cluster. Finally, broadcast rules indicate how segments of different datasources should be co-located in Historical processes. +You can configure a default set of rules to apply to all datasources, and/or you can set specific rules for specific datasources. See [rule structure](#rule-structure) to see how rule order impacts the way the Coordinator applies retention rules. -The Coordinator loads a set of rules from the metadata storage. Rules may be specific to a certain datasource and/or a -default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The -Coordinator will cycle through all used segments and match each segment with the first rule that applies. Each segment -may only match a single rule. +You can specify the data to retain or drop in the following ways: -Note: It is recommended that the web console is used to configure rules. However, the Coordinator process does have HTTP endpoints to programmatically configure rules. +- Forever: all data in the segment. +- Period: segment data specified as an offset from the present time. +- Interval: a fixed time range. + +Retention rules are persistent: they remain in effect until you change them. Druid stores retention rules in its [metadata store](../dependencies/metadata-storage.md). + +## Set retention rules + +You can use the Druid [web console](./web-console.md) or the [Coordinator API](./api-reference.md#coordinator) to create and manage retention rules. + +### Use the web console + +To set retention rules in the Druid web console: + +1. On the console home page, click **Datasources**. +2. Click the name of your datasource to open the data window. +3. Select **Actions > Edit retention rules**. +4. Click **+New rule**. +5. Select a rule type and set properties according to the [rules reference](). +6. Click **Next** and enter a description for the rule. +7. Click **Save** to save and apply the rule to the datasource. + +### Use the Coordinator API + +To set one or more default retention rules for all datasources, send a POST request containing a JSON object for each rule to `/druid/coordinator/v1/rules/_default`. + +The following example request sets a default forever broadcast rule for all datasources: + +```bash +curl --location --request POST 'http://localhost:8888/druid/coordinator/v1/rules/_default' \ +--header 'Content-Type: application/json' \ +--data-raw '[{ + "type": "broadcastForever" + }]' +``` + +To set one or more retention rules for a specific datasource, send a POST request containing a JSON object for each rule to `/druid/coordinator/v1/rules/{datasourceName}`. + +The following example request sets a period drop rule and a period broadcast rule for the `wikipedia` datasource: + +```bash +curl --location --request POST 'http://localhost:8888/druid/coordinator/v1/rules/wikipedia' \ +--header 'Content-Type: application/json' \ +--data-raw '[{ + "type": "dropByPeriod", + "period": "P1M", + "includeFuture": true + }, + { + "type": "broadcastByPeriod", + "period": "P1M", + "includeFuture": true + }]' +``` +To retrieve all rules for all datasources, send a GET request to `/druid/coordinator/v1/rules`—for example: + +```bash +curl --location --request GET 'http://localhost:8888/druid/coordinator/v1/rules' +``` + +### Rule structure + +The rules API accepts an array of rules as JSON objects. The JSON object you send in the API request for each rule is specific to the rules types outlined below. + +> You must pass the entire array of rules, in your desired order, with each API request. Each POST request to the rules API overwrites the existing rules for the specified datasource. + +The order of rules is very important. The Coordinator reads rules in the order in which they appear in the rules list. For example, in the following screenshot the Coordinator evaluates data against rule 1, then rule 2, then rule 3: + + + +The Coordinator cycles through all used segments and matches each segment with the first rule that applies. Each segment can only match a single rule. + +In the web console you can use the up and down arrows on the right side of the interface to change the order of the rules. ## Load rules -Load rules indicate how many replicas of a segment should exist in a server tier. **Please note**: If a Load rule is used to retain only data from a certain interval or period, it must be accompanied by a Drop rule. If a Drop rule is not included, data not within the specified interval or period will be retained by the default rule (loadForever). +Load rules define how Druid assigns segments to [historical process tiers](./mixed-workloads.md#historical-tiering), and how many replicas of a segment exist in each tier. + +If you have a single tier, Druid automatically names the tier `_default` and loads all segments onto it. If you define an additional tier, you must define a load rule to specify which segments to load on that tier. Until you define a load rule, your new tier remains empty. -### Forever Load Rule +### Forever load rule -Forever load rules are of the form: +The forever load rule assigns segment data to specified tiers. It is the default rule Druid applies to datasources. Forever load rules have type `loadForever`. Review Comment: Updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
