This is an automated email from the ASF dual-hosted git repository.
cwylie pushed a commit to branch 0.16.0-incubating
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
The following commit(s) were added to refs/heads/0.16.0-incubating by this push:
new 081aee6 fix doc headers (#8729) (#8737)
081aee6 is described below
commit 081aee66a775e2320b94e451f04bc486f8cad0b9
Author: Clint Wylie <[email protected]>
AuthorDate: Fri Oct 25 01:14:42 2019 -0700
fix doc headers (#8729) (#8737)
---
docs/dependencies/zookeeper.md | 6 +--
docs/querying/groupbyquery.md | 70 +++++++++++++++++++++--------------
docs/querying/searchquery.md | 4 +-
docs/querying/segmentmetadataquery.md | 30 +++++++--------
docs/querying/timeseriesquery.md | 4 +-
docs/querying/topnquery.md | 66 ++++++++++++++++-----------------
website/i18n/en.json | 6 +--
7 files changed, 101 insertions(+), 85 deletions(-)
diff --git a/docs/dependencies/zookeeper.md b/docs/dependencies/zookeeper.md
index 58d571d..300ef74 100644
--- a/docs/dependencies/zookeeper.md
+++ b/docs/dependencies/zookeeper.md
@@ -31,7 +31,7 @@ Apache Druid (incubating) uses [Apache
ZooKeeper](http://zookeeper.apache.org/)
4. [Overlord](../design/overlord.md) leader election
5. [Overlord](../design/overlord.md) and
[MiddleManager](../design/middlemanager.md) task management
-### Coordinator Leader Election
+## Coordinator Leader Election
We use the Curator LeadershipLatch recipe to do leader election at path
@@ -39,7 +39,7 @@ We use the Curator LeadershipLatch recipe to do leader
election at path
${druid.zk.paths.coordinatorPath}/_COORDINATOR
```
-### Segment "publishing" protocol from Historical and Realtime
+## Segment "publishing" protocol from Historical and Realtime
The `announcementsPath` and `servedSegmentsPath` are used for this.
@@ -63,7 +63,7 @@
${druid.zk.paths.servedSegmentsPath}/${druid.host}/_segment_identifier_
Processes like the [Coordinator](../design/coordinator.md) and
[Broker](../design/broker.md) can then watch these paths to see which processes
are currently serving which segments.
-### Segment load/drop protocol between Coordinator and Historical
+## Segment load/drop protocol between Coordinator and Historical
The `loadQueuePath` is used for this.
diff --git a/docs/querying/groupbyquery.md b/docs/querying/groupbyquery.md
index 5e3e7cf..1e5c13e 100644
--- a/docs/querying/groupbyquery.md
+++ b/docs/querying/groupbyquery.md
@@ -122,7 +122,7 @@ To pull it all together, the above query would return
*n\*m* data points, up to
]
```
-### Behavior on multi-value dimensions
+## Behavior on multi-value dimensions
groupBy queries can group on multi-value dimensions. When grouping on a
multi-value dimension, _all_ values
from matching rows will be used to generate one group per value. It's possible
for a query to return more groups than
@@ -133,7 +133,7 @@ improve performance.
See [Multi-value dimensions](multi-value-dimensions.html) for more details.
-### More on subtotalsSpec
+## More on subtotalsSpec
The subtotals feature allows computation of multiple sub-groupings in a single
query. To use this feature, add a "subtotalsSpec" to your query, which should
be a list of subgroup dimension sets. It should contain the "outputName" from
dimensions in your "dimensions" attribute, in the same order as they appear in
the "dimensions" attribute (although, of course, you may skip some). For
example, consider a groupBy query like this one:
```json
@@ -219,9 +219,9 @@ Response for above query would look something like below...
]
```
-### Implementation details
+## Implementation details
-#### Strategies
+### Strategies
GroupBy queries can be executed using two different strategies. The default
strategy for a cluster is determined by the
"druid.query.groupBy.defaultStrategy" runtime property on the Broker. This can
be overridden using "groupByStrategy" in
@@ -242,7 +242,7 @@ merging is always single-threaded. Because the Broker
merges results using the i
the full result set before returning any results. On both the data processes
and the Broker, the merging index is fully
on-heap by default, but it can optionally store aggregated values off-heap.
-#### Differences between v1 and v2
+### Differences between v1 and v2
Query API and results are compatible between the two engines; however, there
are some differences from a cluster
configuration perspective:
@@ -263,30 +263,30 @@ ignores chunkPeriod.
when the grouping key is a single indexed string column. In array-based
aggregation, the dictionary-encoded value is used
as the index, so the aggregated values in the array can be accessed directly
without finding buckets based on hashing.
-#### Memory tuning and resource limits
+### Memory tuning and resource limits
When using groupBy v2, three parameters control resource usage and limits:
-- druid.processing.buffer.sizeBytes: size of the off-heap hash table used for
aggregation, per query, in bytes. At
-most druid.processing.numMergeBuffers of these will be created at once, which
also serves as an upper limit on the
+- `druid.processing.buffer.sizeBytes`: size of the off-heap hash table used
for aggregation, per query, in bytes. At
+most `druid.processing.numMergeBuffers` of these will be created at once,
which also serves as an upper limit on the
number of concurrently running groupBy queries.
-- druid.query.groupBy.maxMergingDictionarySize: size of the on-heap dictionary
used when grouping on strings, per query,
+- `druid.query.groupBy.maxMergingDictionarySize`: size of the on-heap
dictionary used when grouping on strings, per query,
in bytes. Note that this is based on a rough estimate of the dictionary size,
not the actual size.
-- druid.query.groupBy.maxOnDiskStorage: amount of space on disk used for
aggregation, per query, in bytes. By default,
+- `druid.query.groupBy.maxOnDiskStorage`: amount of space on disk used for
aggregation, per query, in bytes. By default,
this is 0, which means aggregation will not use disk.
-If maxOnDiskStorage is 0 (the default) then a query that exceeds either the
on-heap dictionary limit, or the off-heap
+If `maxOnDiskStorage` is 0 (the default) then a query that exceeds either the
on-heap dictionary limit, or the off-heap
aggregation table limit, will fail with a "Resource limit exceeded" error
describing the limit that was exceeded.
-If maxOnDiskStorage is greater than 0, queries that exceed the in-memory
limits will start using disk for aggregation.
+If `maxOnDiskStorage` is greater than 0, queries that exceed the in-memory
limits will start using disk for aggregation.
In this case, when either the on-heap dictionary or off-heap hash table fills
up, partially aggregated records will be
sorted and flushed to disk. Then, both in-memory structures will be cleared
out for further aggregation. Queries that
-then go on to exceed maxOnDiskStorage will fail with a "Resource limit
exceeded" error indicating that they ran out of
+then go on to exceed `maxOnDiskStorage` will fail with a "Resource limit
exceeded" error indicating that they ran out of
disk space.
With groupBy v2, cluster operators should make sure that the off-heap hash
tables and on-heap merging dictionaries
will not exceed available memory for the maximum possible concurrent query
load (given by
-druid.processing.numMergeBuffers). See the [basic cluster tuning
guide](../operations/basic-cluster-tuning.md)
+`druid.processing.numMergeBuffers`). See the [basic cluster tuning
guide](../operations/basic-cluster-tuning.md)
for more details about direct memory usage, organized by Druid process type.
Brokers do not need merge buffers for basic groupBy queries. Queries with
subqueries (using a `query` dataSource) require one merge buffer if there is a
single subquery, or two merge buffers if there is more than one layer of nested
subqueries. Queries with [subtotals](groupbyquery.html#more-on-subtotalsspec)
need one merge buffer. These can stack on top of each other: a groupBy query
with multiple layers of nested subqueries, and that also uses subtotals, will
need three merge buffers.
@@ -294,26 +294,26 @@ Brokers do not need merge buffers for basic groupBy
queries. Queries with subque
Historicals and ingestion tasks need one merge buffer for each groupBy query,
unless [parallel combination](groupbyquery.html#parallel-combine) is enabled,
in which case they need two merge buffers per query.
When using groupBy v1, all aggregation is done on-heap, and resource limits
are done through the parameter
-druid.query.groupBy.maxResults. This is a cap on the maximum number of results
in a result set. Queries that exceed
+`druid.query.groupBy.maxResults`. This is a cap on the maximum number of
results in a result set. Queries that exceed
this limit will fail with a "Resource limit exceeded" error indicating they
exceeded their row limit. Cluster
operators should make sure that the on-heap aggregations will not exceed
available JVM heap space for the expected
concurrent query load.
-#### Performance tuning for groupBy v2
+### Performance tuning for groupBy v2
-##### Limit pushdown optimization
+#### Limit pushdown optimization
Druid pushes down the `limit` spec in groupBy queries to the segments on
Historicals wherever possible to early prune unnecessary intermediate results
and minimize the amount of data transferred to Brokers. By default, this
technique is applied only when all fields in the `orderBy` spec is a subset of
the grouping keys. This is because the `limitPushDown` doesn't guarantee the
exact results if the `orderBy` spec includes any fields that are not in the
grouping keys. However, you can enab [...]
-##### Optimizing hash table
+#### Optimizing hash table
The groupBy v2 engine uses an open addressing hash table for aggregation. The
hash table is initialized with a given initial bucket number and gradually
grows on buffer full. On hash collisions, the linear probing technique is used.
The default number of initial buckets is 1024 and the default max load factor
of the hash table is 0.7. If you can see too many collisions in the hash table,
you can adjust these numbers. See `bufferGrouperInitialBuckets` and
`bufferGrouperMaxLoadFactor` in [Advanced groupBy v2
configurations](#groupby-v2-configurations).
-##### Parallel combine
+#### Parallel combine
Once a Historical finishes aggregation using the hash table, it sorts the
aggregated results and merges them before sending to the
Broker for N-way merge aggregation in the broker. By default, Historicals use
all their available processing threads
@@ -341,7 +341,7 @@ Please note that each Historical needs two merge buffers to
process a groupBy v2
computing intermediate aggregates from each segment and another for combining
intermediate aggregates in parallel.
-#### Alternatives
+### Alternatives
There are some situations where other query types may be a better choice than
groupBy.
@@ -353,7 +353,7 @@ advantage of the fact that segments are already sorted on
time) and does not nee
will sometimes be faster than groupBy. This is especially true if you are
ordering by a metric and find approximate
results acceptable.
-#### Nested groupBys
+### Nested groupBys
Nested groupBys (dataSource of type "query") are performed differently for
"v1" and "v2". The Broker first runs the
inner groupBy query in the usual way. "v1" strategy then materializes the
inner query's results on-heap with Druid's
@@ -361,11 +361,11 @@ indexing mechanism, and runs the outer query on these
materialized results. "v2"
inner query's results stream with off-heap fact map and on-heap string
dictionary that can spill to disk. Both
strategy perform the outer query on the Broker in a single-threaded fashion.
-#### Configurations
+### Configurations
This section describes the configurations for groupBy queries. You can set the
runtime properties in the `runtime.properties` file on Broker, Historical, and
MiddleManager processes. You can set the query context parameters through the
[query context](query-context.html).
-##### Configurations for groupBy v2
+#### Configurations for groupBy v2
Supported runtime properties:
@@ -382,9 +382,9 @@ Supported query contexts:
|`maxOnDiskStorage`|Can be used to lower the value of
`druid.query.groupBy.maxOnDiskStorage` for this query.|
-#### Advanced configurations
+### Advanced configurations
-##### Common configurations for all groupBy strategies
+#### Common configurations for all groupBy strategies
Supported runtime properties:
@@ -401,7 +401,7 @@ Supported query contexts:
|`groupByIsSingleThreaded`|Overrides the value of
`druid.query.groupBy.singleThreaded` for this query.|
-##### GroupBy v2 configurations
+#### GroupBy v2 configurations
Supported runtime properties:
@@ -426,7 +426,7 @@ Supported query contexts:
|`forceLimitPushDown`|When all fields in the orderby are part of the grouping
key, the Broker will push limit application down to the Historical processes.
When the sorting order uses fields that are not in the grouping key, applying
this optimization can result in approximate results with unknown accuracy, so
this optimization is disabled by default in that case. Enabling this context
flag turns on limit push down for limit/orderbys that contain non-grouping key
columns.|false|
-##### GroupBy v1 configurations
+#### GroupBy v1 configurations
Supported runtime properties:
@@ -442,3 +442,19 @@ Supported query contexts:
|`maxIntermediateRows`|Can be used to lower the value of
`druid.query.groupBy.maxIntermediateRows` for this query.|None|
|`maxResults`|Can be used to lower the value of
`druid.query.groupBy.maxResults` for this query.|None|
|`useOffheap`|Set to true to store aggregations off-heap when merging
results.|false|
+
+#### Array based result rows
+
+Internally Druid always uses an array based representation of groupBy result
rows, but by default this is translated
+into a map based result format at the Broker. To reduce the overhead of this
translation, results may also be returned
+from the Broker directly in the array based format if `resultAsArray` is set
to `true` on the query context.
+
+Each row is positional, and has the following fields, in order:
+
+* Timestamp (optional; only if granularity != ALL)
+* Dimensions (in order)
+* Aggregators (in order)
+* Post-aggregators (optional; in order, if present)
+
+This schema is not available on the response, so it must be computed from the
issued query in order to properly read
+the results.
diff --git a/docs/querying/searchquery.md b/docs/querying/searchquery.md
index 2077a09..71494a5 100644
--- a/docs/querying/searchquery.md
+++ b/docs/querying/searchquery.md
@@ -124,7 +124,7 @@ only the rows which satisfy those filters, thereby saving
I/O cost. However, it
and cursor-based execution plans, and chooses the optimal one. Currently, it
is not enabled by default due to the overhead
of cost estimation.
-#### Server configuration
+## Server configuration
The following runtime properties apply:
@@ -132,7 +132,7 @@ The following runtime properties apply:
|--------|-----------|-------|
|`druid.query.search.searchStrategy`|Default search query strategy.|useIndexes|
-#### Query context
+## Query context
The following query context parameters apply:
diff --git a/docs/querying/segmentmetadataquery.md
b/docs/querying/segmentmetadataquery.md
index f23ce3e..e6a2651 100644
--- a/docs/querying/segmentmetadataquery.md
+++ b/docs/querying/segmentmetadataquery.md
@@ -89,18 +89,18 @@ undefined.
Only columns which are dimensions (i.e., have type `STRING`) will have any
cardinality. Rest of the columns (timestamp and metric columns) will show
cardinality as `null`.
-### intervals
+## intervals
If an interval is not specified, the query will use a default interval that
spans a configurable period before the end time of the most recent segment.
The length of this default time period is set in the Broker configuration via:
druid.query.segmentMetadata.defaultHistory
-### toInclude
+## toInclude
There are 3 types of toInclude objects.
-#### All
+### All
The grammar is as follows:
@@ -108,7 +108,7 @@ The grammar is as follows:
"toInclude": { "type": "all"}
```
-#### None
+### None
The grammar is as follows:
@@ -116,7 +116,7 @@ The grammar is as follows:
"toInclude": { "type": "none"}
```
-#### List
+### List
The grammar is as follows:
@@ -124,7 +124,7 @@ The grammar is as follows:
"toInclude": { "type": "list", "columns": [<string list of column names>]}
```
-### analysisTypes
+## analysisTypes
This is a list of properties that determines the amount of information
returned about the columns, i.e. analyses to be performed on the columns.
@@ -135,32 +135,32 @@ The default analysis types can be set in the Broker
configuration via:
Types of column analyses are described below:
-#### cardinality
+### cardinality
* `cardinality` in the result will return the estimated floor of cardinality
for each column. Only relevant for
dimension columns.
-#### minmax
+### minmax
* Estimated min/max values for each column. Only relevant for dimension
columns.
-#### size
+### size
* `size` in the result will contain the estimated total segment byte size as
if the data were stored in text format
-#### interval
+### interval
* `intervals` in the result will contain the list of intervals associated with
the queried segments.
-#### timestampSpec
+### timestampSpec
* `timestampSpec` in the result will contain timestampSpec of data stored in
segments. this can be null if timestampSpec of segments was unknown or
unmergeable (if merging is enabled).
-#### queryGranularity
+### queryGranularity
* `queryGranularity` in the result will contain query granularity of data
stored in segments. this can be null if query granularity of segments was
unknown or unmergeable (if merging is enabled).
-#### aggregators
+### aggregators
* `aggregators` in the result will contain the list of aggregators usable for
querying metric columns. This may be
null if the aggregators are unknown or unmergeable (if merging is enabled).
@@ -169,12 +169,12 @@ null if the aggregators are unknown or unmergeable (if
merging is enabled).
* The form of the result is a map of column name to aggregator.
-#### rollup
+### rollup
* `rollup` in the result is true/false/null.
* When merging is enabled, if some are rollup, others are not, result is null.
-### lenientAggregatorMerge
+## lenientAggregatorMerge
Conflicts between aggregator metadata across segments can occur if some
segments have unknown aggregators, or if
two segments use incompatible aggregators for the same column (e.g. longSum
changed to doubleSum).
diff --git a/docs/querying/timeseriesquery.md b/docs/querying/timeseriesquery.md
index 41ca0c7..69fdaa2 100644
--- a/docs/querying/timeseriesquery.md
+++ b/docs/querying/timeseriesquery.md
@@ -94,7 +94,7 @@ To pull it all together, the above query would return 2 data
points, one for eac
]
```
-#### Grand totals
+## Grand totals
Druid can include an extra "grand totals" row as the last row of a timeseries
result set. To enable this, add
`"grandTotal" : true` to your query context. For example:
@@ -119,7 +119,7 @@ The grand totals row will appear as the last row in the
result array, and will h
row even if the query is run in "descending" mode. Post-aggregations in the
grand totals row will be computed based
upon the grand total aggregations.
-#### Zero-filling
+## Zero-filling
Timeseries queries normally fill empty interior time buckets with zeroes. For
example, if you issue a "day" granularity
timeseries query for the interval 2012-01-01/2012-01-04, and no data exists
for 2012-01-02, you will receive:
diff --git a/docs/querying/topnquery.md b/docs/querying/topnquery.md
index 5df38bc..1e6609b 100644
--- a/docs/querying/topnquery.md
+++ b/docs/querying/topnquery.md
@@ -149,7 +149,7 @@ The format of the results would look like so:
]
```
-### Behavior on multi-value dimensions
+## Behavior on multi-value dimensions
topN queries can group on multi-value dimensions. When grouping on a
multi-value dimension, _all_ values
from matching rows will be used to generate one group per value. It's possible
for a query to return more groups than
@@ -160,7 +160,7 @@ improve performance.
See [Multi-value dimensions](multi-value-dimensions.html) for more details.
-### Aliasing
+## Aliasing
The current TopN algorithm is an approximate algorithm. The top 1000 local
results from each segment are returned for merging to determine the global
topN. As such, the topN algorithm is approximate in both rank and results.
Approximate results *ONLY APPLY WHEN THERE ARE MORE THAN 1000 DIM VALUES*. A
topN over a dimension with fewer than 1000 unique dimension values can be
considered accurate in rank and accurate in aggregates.
@@ -176,16 +176,16 @@ Users wishing to get an *exact rank and exact aggregates*
topN over a dimension
Users who can tolerate *approximate rank* topN over a dimension with greater
than 1000 unique values, but require *exact aggregates* can issue two queries.
One to get the approximate topN dimension values, and another topN with
dimension selection filters which only use the topN results of the first.
-#### Example First query:
+### Example First query
```json
{
"aggregations": [
- {
- "fieldName": "L_QUANTITY_longSum",
- "name": "L_QUANTITY_",
- "type": "longSum"
- }
+ {
+ "fieldName": "L_QUANTITY_longSum",
+ "name": "L_QUANTITY_",
+ "type": "longSum"
+ }
],
"dataSource": "tpch_year",
"dimension":"l_orderkey",
@@ -199,35 +199,35 @@ Users who can tolerate *approximate rank* topN over a
dimension with greater tha
}
```
-#### Example second query:
+### Example second query
```json
{
"aggregations": [
- {
- "fieldName": "L_TAX_doubleSum",
- "name": "L_TAX_",
- "type": "doubleSum"
- },
- {
- "fieldName": "L_DISCOUNT_doubleSum",
- "name": "L_DISCOUNT_",
- "type": "doubleSum"
- },
- {
- "fieldName": "L_EXTENDEDPRICE_doubleSum",
- "name": "L_EXTENDEDPRICE_",
- "type": "doubleSum"
- },
- {
- "fieldName": "L_QUANTITY_longSum",
- "name": "L_QUANTITY_",
- "type": "longSum"
- },
- {
- "name": "count",
- "type": "count"
- }
+ {
+ "fieldName": "L_TAX_doubleSum",
+ "name": "L_TAX_",
+ "type": "doubleSum"
+ },
+ {
+ "fieldName": "L_DISCOUNT_doubleSum",
+ "name": "L_DISCOUNT_",
+ "type": "doubleSum"
+ },
+ {
+ "fieldName": "L_EXTENDEDPRICE_doubleSum",
+ "name": "L_EXTENDEDPRICE_",
+ "type": "doubleSum"
+ },
+ {
+ "fieldName": "L_QUANTITY_longSum",
+ "name": "L_QUANTITY_",
+ "type": "longSum"
+ },
+ {
+ "name": "count",
+ "type": "count"
+ }
],
"dataSource": "tpch_year",
"dimension":"l_orderkey",
diff --git a/website/i18n/en.json b/website/i18n/en.json
index a2a1716..4abced1 100644
--- a/website/i18n/en.json
+++ b/website/i18n/en.json
@@ -50,6 +50,9 @@
"design/coordinator": {
"title": "Coordinator Process"
},
+ "design/extensions-contrib/dropwizard": {
+ "title": "Dropwizard metrics emitter"
+ },
"design/historical": {
"title": "Historical Process"
},
@@ -336,9 +339,6 @@
"operations/pull-deps": {
"title": "pull-deps tool"
},
- "operations/recommendations": {
- "title": "Recommendations"
- },
"operations/reset-cluster": {
"title": "reset-cluster tool"
},
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]