This is an automated email from the ASF dual-hosted git repository. cwylie pushed a commit to branch 0.15.1-incubating in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
commit 36bc4e09d637871e96eae8958d851faa41b0b3b6 Author: Magnus Henoch <[email protected]> AuthorDate: Mon Jul 15 17:55:18 2019 +0100 Fix documentation formatting (#8079) The Markdown dialect used when publishing the documentation to the web site is much more sensitive than Github-flavoured Markdown. In particular, it requires an empty line before code blocks (unless the code block starts right after a heading), otherwise the code block gets formatted in-line with the previous paragraph. Likewise for bullet-point lists. --- .../extensions-contrib/distinctcount.md | 4 +- .../development/extensions-contrib/influx.md | 2 + .../extensions-contrib/materialized-view.md | 3 + .../extensions-contrib/momentsketch-quantiles.md | 6 + .../extensions-contrib/moving-average-query.md | 9 ++ .../extensions-core/druid-basic-security.md | 137 ++++++++++++++++++++- .../development/extensions-core/druid-lookups.md | 1 + docs/content/development/extensions-core/orc.md | 4 + docs/content/querying/filters.md | 6 + 9 files changed, 169 insertions(+), 3 deletions(-) diff --git a/docs/content/development/extensions-contrib/distinctcount.md b/docs/content/development/extensions-contrib/distinctcount.md index a392360..7cf67b5 100644 --- a/docs/content/development/extensions-contrib/distinctcount.md +++ b/docs/content/development/extensions-contrib/distinctcount.md @@ -28,8 +28,8 @@ To use this Apache Druid (incubating) extension, make sure to [include](../../op Additionally, follow these steps: -(1) First, use a single dimension hash-based partition spec to partition data by a single dimension. For example visitor_id. This to make sure all rows with a particular value for that dimension will go into the same segment, or this might over count. -(2) Second, use distinctCount to calculate the distinct count, make sure queryGranularity is divided exactly by segmentGranularity or else the result will be wrong. +1. First, use a single dimension hash-based partition spec to partition data by a single dimension. For example visitor_id. This to make sure all rows with a particular value for that dimension will go into the same segment, or this might over count. +2. Second, use distinctCount to calculate the distinct count, make sure queryGranularity is divided exactly by segmentGranularity or else the result will be wrong. There are some limitations, when used with groupBy, the groupBy keys' numbers should not exceed maxIntermediateRows in every segment. If exceeded the result will be wrong. When used with topN, numValuesPerPass should not be too big. If too big the distinctCount will use a lot of memory and might cause the JVM to go our of memory. diff --git a/docs/content/development/extensions-contrib/influx.md b/docs/content/development/extensions-contrib/influx.md index c5c071b..62e036b 100644 --- a/docs/content/development/extensions-contrib/influx.md +++ b/docs/content/development/extensions-contrib/influx.md @@ -35,6 +35,7 @@ A typical line looks like this: ```cpu,application=dbhost=prdb123,region=us-east-1 usage_idle=99.24,usage_user=0.55 1520722030000000000``` which contains four parts: + - measurement: A string indicating the name of the measurement represented (e.g. cpu, network, web_requests) - tags: zero or more key-value pairs (i.e. dimensions) - measurements: one or more key-value pairs; values can be numeric, boolean, or string @@ -43,6 +44,7 @@ which contains four parts: The parser extracts these fields into a map, giving the measurement the key `measurement` and the timestamp the key `_ts`. The tag and measurement keys are copied verbatim, so users should take care to avoid name collisions. It is up to the ingestion spec to decide which fields should be treated as dimensions and which should be treated as metrics (typically tags correspond to dimensions and measurements correspond to metrics). The parser is configured like so: + ```json "parser": { "type": "string", diff --git a/docs/content/development/extensions-contrib/materialized-view.md b/docs/content/development/extensions-contrib/materialized-view.md index 95bfde9..963a944 100644 --- a/docs/content/development/extensions-contrib/materialized-view.md +++ b/docs/content/development/extensions-contrib/materialized-view.md @@ -33,6 +33,7 @@ In materialized-view-maintenance, dataSouces user ingested are called "base-data The `derivativeDataSource` supervisor is used to keep the timeline of derived-dataSource consistent with base-dataSource. Each `derivativeDataSource` supervisor is responsible for one derived-dataSource. A sample derivativeDataSource supervisor spec is shown below: + ```json { "type": "derivativeDataSource", @@ -90,6 +91,7 @@ A sample derivativeDataSource supervisor spec is shown below: In materialized-view-selection, we implement a new query type `view`. When we request a view query, Druid will try its best to optimize the query based on query dataSource and intervals. A sample view query spec is shown below: + ```json { "queryType": "view", @@ -124,6 +126,7 @@ A sample view query spec is shown below: } } ``` + There are 2 parts in a view query: |Field|Description|Required| diff --git a/docs/content/development/extensions-contrib/momentsketch-quantiles.md b/docs/content/development/extensions-contrib/momentsketch-quantiles.md index 966caa2..3eeadaf 100644 --- a/docs/content/development/extensions-contrib/momentsketch-quantiles.md +++ b/docs/content/development/extensions-contrib/momentsketch-quantiles.md @@ -38,6 +38,7 @@ druid.extensions.loadList=["druid-momentsketch"] The result of the aggregation is a momentsketch that is the union of all sketches either built from raw data or read from the segments. The `momentSketch` aggregator operates over raw data while the `momentSketchMerge` aggregator should be used when aggregating pre-computed sketches. + ```json { "type" : <aggregator_type>, @@ -59,6 +60,7 @@ The `momentSketch` aggregator operates over raw data while the `momentSketchMerg ### Post Aggregators Users can query for a set of quantiles using the `momentSketchSolveQuantiles` post-aggregator on the sketches created by the `momentSketch` or `momentSketchMerge` aggregators. + ```json { "type" : "momentSketchSolveQuantiles", @@ -69,6 +71,7 @@ Users can query for a set of quantiles using the `momentSketchSolveQuantiles` po ``` Users can also query for the min/max of a distribution: + ```json { "type" : "momentSketchMin" | "momentSketchMax", @@ -79,6 +82,7 @@ Users can also query for the min/max of a distribution: ### Example As an example of a query with sketches pre-aggregated at ingestion time, one could set up the following aggregator at ingest: + ```json { "type": "momentSketch", @@ -88,7 +92,9 @@ As an example of a query with sketches pre-aggregated at ingestion time, one cou "compress": true, } ``` + and make queries using the following aggregator + post-aggregator: + ```json { "aggregations": [{ diff --git a/docs/content/development/extensions-contrib/moving-average-query.md b/docs/content/development/extensions-contrib/moving-average-query.md index 5fc7268..7e028cc 100644 --- a/docs/content/development/extensions-contrib/moving-average-query.md +++ b/docs/content/development/extensions-contrib/moving-average-query.md @@ -33,6 +33,7 @@ These Aggregate Window Functions consume standard Druid Aggregators and outputs Moving Average encapsulates the [groupBy query](../../querying/groupbyquery.html) (Or [timeseries](../../querying/timeseriesquery.html) in case of no dimensions) in order to rely on the maturity of these query types. It runs the query in two main phases: + 1. Runs an inner [groupBy](../../querying/groupbyquery.html) or [timeseries](../../querying/timeseriesquery.html) query to compute Aggregators (i.e. daily count of events). 2. Passes over aggregated results in Broker, in order to compute Averagers (i.e. moving 7 day average of the daily count). @@ -110,6 +111,7 @@ These are properties which are common to all Averagers: #### Standard averagers These averagers offer four functions: + * Mean (Average) * MeanNoNulls (Ignores empty buckets). * Max @@ -121,6 +123,7 @@ In that case, the first records will ignore missing buckets and average won't be However, this also means that empty days in a sparse dataset will also be ignored. Example of usage: + ```json { "type" : "doubleMean", "name" : <output_name>, "fieldName": <input_name> } ``` @@ -130,6 +133,7 @@ This optional parameter is used to calculate over a single bucket within each cy A prime example would be weekly buckets, resulting in a Day of Week calculation. (Other examples: Month of year, Hour of day). I.e. when using these parameters: + * *granularity*: period=P1D (daily) * *buckets*: 28 * *cycleSize*: 7 @@ -146,6 +150,7 @@ All examples are based on the Wikipedia dataset provided in the Druid [tutorials Calculating a 7-buckets moving average for Wikipedia edit deltas. Query syntax: + ```json { "queryType": "movingAverage", @@ -176,6 +181,7 @@ Query syntax: ``` Result: + ```json [ { "version" : "v1", @@ -217,6 +223,7 @@ Result: Calculating a 7-buckets moving average for Wikipedia edit deltas, plus a ratio between the current period and the moving average. Query syntax: + ```json { "queryType": "movingAverage", @@ -264,6 +271,7 @@ Query syntax: ``` Result: + ```json [ { "version" : "v1", @@ -306,6 +314,7 @@ Result: Calculating an average of every first 10-minutes of the last 3 hours: Query syntax: + ```json { "queryType": "movingAverage", diff --git a/docs/content/development/extensions-core/druid-basic-security.md b/docs/content/development/extensions-core/druid-basic-security.md index 28eff1f..e067fdf 100644 --- a/docs/content/development/extensions-core/druid-basic-security.md +++ b/docs/content/development/extensions-core/druid-basic-security.md @@ -172,6 +172,90 @@ Return a list of all user names. `GET(/druid-ext/basic-security/authorization/db/{authorizerName}/users/{userName})` Return the name and role information of the user with name {userName} +Example output: + +```json +{ + "name": "druid2", + "roles": [ + "druidRole" + ] +} +``` + +This API supports the following flags: + +- `?full`: The response will also include the full information for each role currently assigned to the user. + +Example output: + +```json +{ + "name": "druid2", + "roles": [ + { + "name": "druidRole", + "permissions": [ + { + "resourceAction": { + "resource": { + "name": "A", + "type": "DATASOURCE" + }, + "action": "READ" + }, + "resourceNamePattern": "A" + }, + { + "resourceAction": { + "resource": { + "name": "C", + "type": "CONFIG" + }, + "action": "WRITE" + }, + "resourceNamePattern": "C" + } + ] + } + ] +} +``` + +The output format of this API when `?full` is specified is deprecated and in later versions will be switched to the output format used when both `?full` and `?simplifyPermissions` flag is set. + +The `resourceNamePattern` is a compiled version of the resource name regex. It is redundant and complicates the use of this API for clients such as frontends that edit the authorization configuration, as the permission format in this output does not match the format used for adding permissions to a role. + +- `?full?simplifyPermissions`: When both `?full` and `?simplifyPermissions` are set, the permissions in the output will contain only a list of `resourceAction` objects, without the extraneous `resourceNamePattern` field. + +```json +{ + "name": "druid2", + "roles": [ + { + "name": "druidRole", + "users": null, + "permissions": [ + { + "resource": { + "name": "A", + "type": "DATASOURCE" + }, + "action": "READ" + }, + { + "resource": { + "name": "C", + "type": "CONFIG" + }, + "action": "WRITE" + } + ] + } + ] +} +``` + `POST(/druid-ext/basic-security/authorization/db/{authorizerName}/users/{userName})` Create a new user with name {userName} @@ -184,7 +268,58 @@ Delete the user with name {userName} Return a list of all role names. `GET(/druid-ext/basic-security/authorization/db/{authorizerName}/roles/{roleName})` -Return name and permissions for the role named {roleName} +Return name and permissions for the role named {roleName}. + +Example output: + +```json +{ + "name": "druidRole2", + "permissions": [ + { + "resourceAction": { + "resource": { + "name": "E", + "type": "DATASOURCE" + }, + "action": "WRITE" + }, + "resourceNamePattern": "E" + } + ] +} +``` + +The default output format of this API is deprecated and in later versions will be switched to the output format used when the `?simplifyPermissions` flag is set. The `resourceNamePattern` is a compiled version of the resource name regex. It is redundant and complicates the use of this API for clients such as frontends that edit the authorization configuration, as the permission format in this output does not match the format used for adding permissions to a role. + +This API supports the following flags: + +- `?full`: The output will contain an extra `users` list, containing the users that currently have this role. + +```json +"users":["druid"] +``` + +- `?simplifyPermissions`: The permissions in the output will contain only a list of `resourceAction` objects, without the extraneous `resourceNamePattern` field. The `users` field will be null when `?full` is not specified. + +Example output: + +```json +{ + "name": "druidRole2", + "users": null, + "permissions": [ + { + "resource": { + "name": "E", + "type": "DATASOURCE" + }, + "action": "WRITE" + } + ] +} +``` + `POST(/druid-ext/basic-security/authorization/db/{authorizerName}/roles/{roleName})` Create a new role with name {roleName}. diff --git a/docs/content/development/extensions-core/druid-lookups.md b/docs/content/development/extensions-core/druid-lookups.md index 53476eb..9f5798e 100644 --- a/docs/content/development/extensions-core/druid-lookups.md +++ b/docs/content/development/extensions-core/druid-lookups.md @@ -75,6 +75,7 @@ Same for Loading cache, developer can implement a new type of loading cache by i ##### Example of Polling On-heap Lookup This example demonstrates a polling cache that will update its on-heap cache every 10 minutes + ```json { "type":"pollingLookup", diff --git a/docs/content/development/extensions-core/orc.md b/docs/content/development/extensions-core/orc.md index af7a315..791531d 100644 --- a/docs/content/development/extensions-core/orc.md +++ b/docs/content/development/extensions-core/orc.md @@ -269,6 +269,7 @@ This extension, first available in version 0.15.0, replaces the previous 'contri ingestion task is *incompatible*, and will need modified to work with the newer 'core' extension. To migrate to 0.15.0+: + * In `inputSpec` of `ioConfig`, `inputFormat` must be changed from `"org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat"` to `"org.apache.orc.mapreduce.OrcInputFormat"` * The 'contrib' extension supported a `typeString` property, which provided the schema of the @@ -276,6 +277,7 @@ ORC file, of which was essentially required to have the types correct, but notab facilitated column renaming. In the 'core' extension, column renaming can be achieved with [`flattenSpec` expressions](../../ingestion/flatten-json.html). For example, `"typeString":"struct<time:string,name:string>"` with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid schema would need replaced with: + ```json "flattenSpec": { "fields": [ @@ -293,10 +295,12 @@ with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid sc ... } ``` + * The 'contrib' extension supported a `mapFieldNameFormat` property, which provided a way to specify a dimension to flatten `OrcMap` columns with primitive types. This functionality has also been replaced with [`flattenSpec` expressions](../../ingestion/flatten-json.html). For example: `"mapFieldNameFormat": "<PARENT>_<CHILD>"` for a dimension `nestedData_dim1`, to preserve Druid schema could be replaced with + ```json "flattenSpec": { "fields": [ diff --git a/docs/content/querying/filters.md b/docs/content/querying/filters.md index 2f9b23a..53e0853 100644 --- a/docs/content/querying/filters.md +++ b/docs/content/querying/filters.md @@ -282,6 +282,7 @@ greater than, less than, greater than or equal to, less than or equal to, and "b Bound filters support the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details. The following bound filter expresses the condition `21 <= age <= 31`: + ```json { "type": "bound", @@ -293,6 +294,7 @@ The following bound filter expresses the condition `21 <= age <= 31`: ``` This filter expresses the condition `foo <= name <= hoo`, using the default lexicographic sorting order. + ```json { "type": "bound", @@ -303,6 +305,7 @@ This filter expresses the condition `foo <= name <= hoo`, using the default lexi ``` Using strict bounds, this filter expresses the condition `21 < age < 31` + ```json { "type": "bound", @@ -316,6 +319,7 @@ Using strict bounds, this filter expresses the condition `21 < age < 31` ``` The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses `age < 31`. + ```json { "type": "bound", @@ -327,6 +331,7 @@ The user can also specify a one-sided bound by omitting "upper" or "lower". This ``` Likewise, this filter expresses `age >= 18` + ```json { "type": "bound", @@ -355,6 +360,7 @@ The interval filter supports the use of extraction functions, see [Filtering wit If an extraction function is used with this filter, the extraction function should output values that are parseable as long milliseconds. The following example filters on the time ranges of October 1-7, 2014 and November 15-16, 2014. + ```json { "type" : "interval", --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
