[incubator-druid] branch 0.16.0-incubating updated: fix doc headers (#8729) (#8737)

cwylie Fri, 25 Oct 2019 01:15:07 -0700

This is an automated email from the ASF dual-hosted git repository.

cwylie pushed a commit to branch 0.16.0-incubating
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git



The following commit(s) were added to refs/heads/0.16.0-incubating by this push:
     new 081aee6  fix doc headers (#8729) (#8737)
081aee6 is described below

commit 081aee66a775e2320b94e451f04bc486f8cad0b9
Author: Clint Wylie <[email protected]>
AuthorDate: Fri Oct 25 01:14:42 2019 -0700

    fix doc headers (#8729) (#8737)
---
 docs/dependencies/zookeeper.md        |  6 +--
 docs/querying/groupbyquery.md         | 70 +++++++++++++++++++++--------------
 docs/querying/searchquery.md          |  4 +-
 docs/querying/segmentmetadataquery.md | 30 +++++++--------
 docs/querying/timeseriesquery.md      |  4 +-
 docs/querying/topnquery.md            | 66 ++++++++++++++++-----------------
 website/i18n/en.json                  |  6 +--
 7 files changed, 101 insertions(+), 85 deletions(-)

diff --git a/docs/dependencies/zookeeper.md b/docs/dependencies/zookeeper.md
index 58d571d..300ef74 100644
--- a/docs/dependencies/zookeeper.md
+++ b/docs/dependencies/zookeeper.md
@@ -31,7 +31,7 @@ Apache Druid (incubating) uses [Apache 
ZooKeeper](http://zookeeper.apache.org/)
 4.  [Overlord](../design/overlord.md) leader election
 5.  [Overlord](../design/overlord.md) and 
[MiddleManager](../design/middlemanager.md) task management
 
-### Coordinator Leader Election
+## Coordinator Leader Election
 
 We use the Curator LeadershipLatch recipe to do leader election at path
 
@@ -39,7 +39,7 @@ We use the Curator LeadershipLatch recipe to do leader 
election at path
 ${druid.zk.paths.coordinatorPath}/_COORDINATOR
 ```
 
-### Segment "publishing" protocol from Historical and Realtime
+## Segment "publishing" protocol from Historical and Realtime
 
 The `announcementsPath` and `servedSegmentsPath` are used for this.
 
@@ -63,7 +63,7 @@ 
${druid.zk.paths.servedSegmentsPath}/${druid.host}/_segment_identifier_
 
 Processes like the [Coordinator](../design/coordinator.md) and 
[Broker](../design/broker.md) can then watch these paths to see which processes 
are currently serving which segments.
 
-### Segment load/drop protocol between Coordinator and Historical
+## Segment load/drop protocol between Coordinator and Historical
 
 The `loadQueuePath` is used for this.
 
diff --git a/docs/querying/groupbyquery.md b/docs/querying/groupbyquery.md
index 5e3e7cf..1e5c13e 100644
--- a/docs/querying/groupbyquery.md
+++ b/docs/querying/groupbyquery.md
@@ -122,7 +122,7 @@ To pull it all together, the above query would return 
*n\*m* data points, up to
 ]
 ```
 
-### Behavior on multi-value dimensions
+## Behavior on multi-value dimensions
 
 groupBy queries can group on multi-value dimensions. When grouping on a 
multi-value dimension, _all_ values
 from matching rows will be used to generate one group per value. It's possible 
for a query to return more groups than
@@ -133,7 +133,7 @@ improve performance.
 
 See [Multi-value dimensions](multi-value-dimensions.html) for more details.
 
-### More on subtotalsSpec
+## More on subtotalsSpec
 The subtotals feature allows computation of multiple sub-groupings in a single 
query. To use this feature, add a "subtotalsSpec" to your query, which should 
be a list of subgroup dimension sets. It should contain the "outputName" from 
dimensions in your "dimensions" attribute, in the same order as they appear in 
the "dimensions" attribute (although, of course, you may skip some). For 
example, consider a groupBy query like this one:
 
 ```json
@@ -219,9 +219,9 @@ Response for above query would look something like below...
 ]
 ```
 
-### Implementation details
+## Implementation details
 
-#### Strategies
+### Strategies
 
 GroupBy queries can be executed using two different strategies. The default 
strategy for a cluster is determined by the
 "druid.query.groupBy.defaultStrategy" runtime property on the Broker. This can 
be overridden using "groupByStrategy" in
@@ -242,7 +242,7 @@ merging is always single-threaded. Because the Broker 
merges results using the i
 the full result set before returning any results. On both the data processes 
and the Broker, the merging index is fully
 on-heap by default, but it can optionally store aggregated values off-heap.
 
-#### Differences between v1 and v2
+### Differences between v1 and v2
 
 Query API and results are compatible between the two engines; however, there 
are some differences from a cluster
 configuration perspective:
@@ -263,30 +263,30 @@ ignores chunkPeriod.
 when the grouping key is a single indexed string column. In array-based 
aggregation, the dictionary-encoded value is used
 as the index, so the aggregated values in the array can be accessed directly 
without finding buckets based on hashing.
 
-#### Memory tuning and resource limits
+### Memory tuning and resource limits
 
 When using groupBy v2, three parameters control resource usage and limits:
 
-- druid.processing.buffer.sizeBytes: size of the off-heap hash table used for 
aggregation, per query, in bytes. At
-most druid.processing.numMergeBuffers of these will be created at once, which 
also serves as an upper limit on the
+- `druid.processing.buffer.sizeBytes`: size of the off-heap hash table used 
for aggregation, per query, in bytes. At
+most `druid.processing.numMergeBuffers` of these will be created at once, 
which also serves as an upper limit on the
 number of concurrently running groupBy queries.
-- druid.query.groupBy.maxMergingDictionarySize: size of the on-heap dictionary 
used when grouping on strings, per query,
+- `druid.query.groupBy.maxMergingDictionarySize`: size of the on-heap 
dictionary used when grouping on strings, per query,
 in bytes. Note that this is based on a rough estimate of the dictionary size, 
not the actual size.
-- druid.query.groupBy.maxOnDiskStorage: amount of space on disk used for 
aggregation, per query, in bytes. By default,
+- `druid.query.groupBy.maxOnDiskStorage`: amount of space on disk used for 
aggregation, per query, in bytes. By default,
 this is 0, which means aggregation will not use disk.
 
-If maxOnDiskStorage is 0 (the default) then a query that exceeds either the 
on-heap dictionary limit, or the off-heap
+If `maxOnDiskStorage` is 0 (the default) then a query that exceeds either the 
on-heap dictionary limit, or the off-heap
 aggregation table limit, will fail with a "Resource limit exceeded" error 
describing the limit that was exceeded.
 
-If maxOnDiskStorage is greater than 0, queries that exceed the in-memory 
limits will start using disk for aggregation.
+If `maxOnDiskStorage` is greater than 0, queries that exceed the in-memory 
limits will start using disk for aggregation.
 In this case, when either the on-heap dictionary or off-heap hash table fills 
up, partially aggregated records will be
 sorted and flushed to disk. Then, both in-memory structures will be cleared 
out for further aggregation. Queries that
-then go on to exceed maxOnDiskStorage will fail with a "Resource limit 
exceeded" error indicating that they ran out of
+then go on to exceed `maxOnDiskStorage` will fail with a "Resource limit 
exceeded" error indicating that they ran out of
 disk space.
 
 With groupBy v2, cluster operators should make sure that the off-heap hash 
tables and on-heap merging dictionaries
 will not exceed available memory for the maximum possible concurrent query 
load (given by
-druid.processing.numMergeBuffers). See the [basic cluster tuning 
guide](../operations/basic-cluster-tuning.md) 
+`druid.processing.numMergeBuffers`). See the [basic cluster tuning 
guide](../operations/basic-cluster-tuning.md) 
 for more details about direct memory usage, organized by Druid process type.
 
 Brokers do not need merge buffers for basic groupBy queries. Queries with 
subqueries (using a `query` dataSource) require one merge buffer if there is a 
single subquery, or two merge buffers if there is more than one layer of nested 
subqueries. Queries with [subtotals](groupbyquery.html#more-on-subtotalsspec) 
need one merge buffer. These can stack on top of each other: a groupBy query 
with multiple layers of nested subqueries, and that also uses subtotals, will 
need three merge buffers.
@@ -294,26 +294,26 @@ Brokers do not need merge buffers for basic groupBy 
queries. Queries with subque
 Historicals and ingestion tasks need one merge buffer for each groupBy query, 
unless [parallel combination](groupbyquery.html#parallel-combine) is enabled, 
in which case they need two merge buffers per query.
 
 When using groupBy v1, all aggregation is done on-heap, and resource limits 
are done through the parameter
-druid.query.groupBy.maxResults. This is a cap on the maximum number of results 
in a result set. Queries that exceed
+`druid.query.groupBy.maxResults`. This is a cap on the maximum number of 
results in a result set. Queries that exceed
 this limit will fail with a "Resource limit exceeded" error indicating they 
exceeded their row limit. Cluster
 operators should make sure that the on-heap aggregations will not exceed 
available JVM heap space for the expected
 concurrent query load.
 
-#### Performance tuning for groupBy v2
+### Performance tuning for groupBy v2
 
-##### Limit pushdown optimization
+#### Limit pushdown optimization
 
 Druid pushes down the `limit` spec in groupBy queries to the segments on 
Historicals wherever possible to early prune unnecessary intermediate results 
and minimize the amount of data transferred to Brokers. By default, this 
technique is applied only when all fields in the `orderBy` spec is a subset of 
the grouping keys. This is because the `limitPushDown` doesn't guarantee the 
exact results if the `orderBy` spec includes any fields that are not in the 
grouping keys. However, you can enab [...]
 
 
-##### Optimizing hash table
+#### Optimizing hash table
 
 The groupBy v2 engine uses an open addressing hash table for aggregation. The 
hash table is initialized with a given initial bucket number and gradually 
grows on buffer full. On hash collisions, the linear probing technique is used.
 
 The default number of initial buckets is 1024 and the default max load factor 
of the hash table is 0.7. If you can see too many collisions in the hash table, 
you can adjust these numbers. See `bufferGrouperInitialBuckets` and 
`bufferGrouperMaxLoadFactor` in [Advanced groupBy v2 
configurations](#groupby-v2-configurations).
 
 
-##### Parallel combine
+#### Parallel combine
 
 Once a Historical finishes aggregation using the hash table, it sorts the 
aggregated results and merges them before sending to the
 Broker for N-way merge aggregation in the broker. By default, Historicals use 
all their available processing threads
@@ -341,7 +341,7 @@ Please note that each Historical needs two merge buffers to 
process a groupBy v2
 computing intermediate aggregates from each segment and another for combining 
intermediate aggregates in parallel.
 
 
-#### Alternatives
+### Alternatives
 
 There are some situations where other query types may be a better choice than 
groupBy.
 
@@ -353,7 +353,7 @@ advantage of the fact that segments are already sorted on 
time) and does not nee
 will sometimes be faster than groupBy. This is especially true if you are 
ordering by a metric and find approximate
 results acceptable.
 
-#### Nested groupBys
+### Nested groupBys
 
 Nested groupBys (dataSource of type "query") are performed differently for 
"v1" and "v2". The Broker first runs the
 inner groupBy query in the usual way. "v1" strategy then materializes the 
inner query's results on-heap with Druid's
@@ -361,11 +361,11 @@ indexing mechanism, and runs the outer query on these 
materialized results. "v2"
 inner query's results stream with off-heap fact map and on-heap string 
dictionary that can spill to disk. Both
 strategy perform the outer query on the Broker in a single-threaded fashion.
 
-#### Configurations
+### Configurations
 
 This section describes the configurations for groupBy queries. You can set the 
runtime properties in the `runtime.properties` file on Broker, Historical, and 
MiddleManager processes. You can set the query context parameters through the 
[query context](query-context.html).
 
-##### Configurations for groupBy v2
+#### Configurations for groupBy v2
 
 Supported runtime properties:
 
@@ -382,9 +382,9 @@ Supported query contexts:
 |`maxOnDiskStorage`|Can be used to lower the value of 
`druid.query.groupBy.maxOnDiskStorage` for this query.|
 
 
-#### Advanced configurations
+### Advanced configurations
 
-##### Common configurations for all groupBy strategies
+#### Common configurations for all groupBy strategies
 
 Supported runtime properties:
 
@@ -401,7 +401,7 @@ Supported query contexts:
 |`groupByIsSingleThreaded`|Overrides the value of 
`druid.query.groupBy.singleThreaded` for this query.|
 
 
-##### GroupBy v2 configurations
+#### GroupBy v2 configurations
 
 Supported runtime properties:
 
@@ -426,7 +426,7 @@ Supported query contexts:
 |`forceLimitPushDown`|When all fields in the orderby are part of the grouping 
key, the Broker will push limit application down to the Historical processes. 
When the sorting order uses fields that are not in the grouping key, applying 
this optimization can result in approximate results with unknown accuracy, so 
this optimization is disabled by default in that case. Enabling this context 
flag turns on limit push down for limit/orderbys that contain non-grouping key 
columns.|false|
 
 
-##### GroupBy v1 configurations
+#### GroupBy v1 configurations
 
 Supported runtime properties:
 
@@ -442,3 +442,19 @@ Supported query contexts:
 |`maxIntermediateRows`|Can be used to lower the value of 
`druid.query.groupBy.maxIntermediateRows` for this query.|None|
 |`maxResults`|Can be used to lower the value of 
`druid.query.groupBy.maxResults` for this query.|None|
 |`useOffheap`|Set to true to store aggregations off-heap when merging 
results.|false|
+
+#### Array based result rows
+
+Internally Druid always uses an array based representation of groupBy result 
rows, but by default this is translated
+into a map based result format at the Broker. To reduce the overhead of this 
translation, results may also be returned
+from the Broker directly in the array based format if `resultAsArray` is set 
to `true` on the query context.
+
+Each row is positional, and has the following fields, in order:
+
+* Timestamp (optional; only if granularity != ALL)
+* Dimensions (in order)
+* Aggregators (in order)
+* Post-aggregators (optional; in order, if present)
+
+This schema is not available on the response, so it must be computed from the 
issued query in order to properly read
+the results.
diff --git a/docs/querying/searchquery.md b/docs/querying/searchquery.md
index 2077a09..71494a5 100644
--- a/docs/querying/searchquery.md
+++ b/docs/querying/searchquery.md
@@ -124,7 +124,7 @@ only the rows which satisfy those filters, thereby saving 
I/O cost. However, it
 and cursor-based execution plans, and chooses the optimal one. Currently, it 
is not enabled by default due to the overhead
 of cost estimation.
 
-#### Server configuration
+## Server configuration
 
 The following runtime properties apply:
 
@@ -132,7 +132,7 @@ The following runtime properties apply:
 |--------|-----------|-------|
 |`druid.query.search.searchStrategy`|Default search query strategy.|useIndexes|
 
-#### Query context
+## Query context
 
 The following query context parameters apply:
 
diff --git a/docs/querying/segmentmetadataquery.md 
b/docs/querying/segmentmetadataquery.md
index f23ce3e..e6a2651 100644
--- a/docs/querying/segmentmetadataquery.md
+++ b/docs/querying/segmentmetadataquery.md
@@ -89,18 +89,18 @@ undefined.
 
 Only columns which are dimensions (i.e., have type `STRING`) will have any 
cardinality. Rest of the columns (timestamp and metric columns) will show 
cardinality as `null`.
 
-### intervals
+## intervals
 
 If an interval is not specified, the query will use a default interval that 
spans a configurable period before the end time of the most recent segment.
 
 The length of this default time period is set in the Broker configuration via:
   druid.query.segmentMetadata.defaultHistory
 
-### toInclude
+## toInclude
 
 There are 3 types of toInclude objects.
 
-#### All
+### All
 
 The grammar is as follows:
 
@@ -108,7 +108,7 @@ The grammar is as follows:
 "toInclude": { "type": "all"}
 ```
 
-#### None
+### None
 
 The grammar is as follows:
 
@@ -116,7 +116,7 @@ The grammar is as follows:
 "toInclude": { "type": "none"}
 ```
 
-#### List
+### List
 
 The grammar is as follows:
 
@@ -124,7 +124,7 @@ The grammar is as follows:
 "toInclude": { "type": "list", "columns": [<string list of column names>]}
 ```
 
-### analysisTypes
+## analysisTypes
 
 This is a list of properties that determines the amount of information 
returned about the columns, i.e. analyses to be performed on the columns.
 
@@ -135,32 +135,32 @@ The default analysis types can be set in the Broker 
configuration via:
 
 Types of column analyses are described below:
 
-#### cardinality
+### cardinality
 
 * `cardinality` in the result will return the estimated floor of cardinality 
for each column. Only relevant for
 dimension columns.
 
-#### minmax
+### minmax
 
 * Estimated min/max values for each column. Only relevant for dimension 
columns.
 
-#### size
+### size
 
 * `size` in the result will contain the estimated total segment byte size as 
if the data were stored in text format
 
-#### interval
+### interval
 
 * `intervals` in the result will contain the list of intervals associated with 
the queried segments.
 
-#### timestampSpec
+### timestampSpec
 
 * `timestampSpec` in the result will contain timestampSpec of data stored in 
segments. this can be null if timestampSpec of segments was unknown or 
unmergeable (if merging is enabled).
 
-#### queryGranularity
+### queryGranularity
 
 * `queryGranularity` in the result will contain query granularity of data 
stored in segments. this can be null if query granularity of segments was 
unknown or unmergeable (if merging is enabled).
 
-#### aggregators
+### aggregators
 
 * `aggregators` in the result will contain the list of aggregators usable for 
querying metric columns. This may be
 null if the aggregators are unknown or unmergeable (if merging is enabled).
@@ -169,12 +169,12 @@ null if the aggregators are unknown or unmergeable (if 
merging is enabled).
 
 * The form of the result is a map of column name to aggregator.
 
-#### rollup
+### rollup
 
 * `rollup` in the result is true/false/null.
 * When merging is enabled, if some are rollup, others are not, result is null.
 
-### lenientAggregatorMerge
+## lenientAggregatorMerge
 
 Conflicts between aggregator metadata across segments can occur if some 
segments have unknown aggregators, or if
 two segments use incompatible aggregators for the same column (e.g. longSum 
changed to doubleSum).
diff --git a/docs/querying/timeseriesquery.md b/docs/querying/timeseriesquery.md
index 41ca0c7..69fdaa2 100644
--- a/docs/querying/timeseriesquery.md
+++ b/docs/querying/timeseriesquery.md
@@ -94,7 +94,7 @@ To pull it all together, the above query would return 2 data 
points, one for eac
 ]
 ```
 
-#### Grand totals
+## Grand totals
 
 Druid can include an extra "grand totals" row as the last row of a timeseries 
result set. To enable this, add
 `"grandTotal" : true` to your query context. For example:
@@ -119,7 +119,7 @@ The grand totals row will appear as the last row in the 
result array, and will h
 row even if the query is run in "descending" mode. Post-aggregations in the 
grand totals row will be computed based
 upon the grand total aggregations.
 
-#### Zero-filling
+## Zero-filling
 
 Timeseries queries normally fill empty interior time buckets with zeroes. For 
example, if you issue a "day" granularity
 timeseries query for the interval 2012-01-01/2012-01-04, and no data exists 
for 2012-01-02, you will receive:
diff --git a/docs/querying/topnquery.md b/docs/querying/topnquery.md
index 5df38bc..1e6609b 100644
--- a/docs/querying/topnquery.md
+++ b/docs/querying/topnquery.md
@@ -149,7 +149,7 @@ The format of the results would look like so:
 ]
 ```
 
-### Behavior on multi-value dimensions
+## Behavior on multi-value dimensions
 
 topN queries can group on multi-value dimensions. When grouping on a 
multi-value dimension, _all_ values
 from matching rows will be used to generate one group per value. It's possible 
for a query to return more groups than
@@ -160,7 +160,7 @@ improve performance.
 
 See [Multi-value dimensions](multi-value-dimensions.html) for more details.
 
-### Aliasing
+## Aliasing
 
 The current TopN algorithm is an approximate algorithm. The top 1000 local 
results from each segment are returned for merging to determine the global 
topN. As such, the topN algorithm is approximate in both rank and results. 
Approximate results *ONLY APPLY WHEN THERE ARE MORE THAN 1000 DIM VALUES*. A 
topN over a dimension with fewer than 1000 unique dimension values can be 
considered accurate in rank and accurate in aggregates.
 
@@ -176,16 +176,16 @@ Users wishing to get an *exact rank and exact aggregates* 
topN over a dimension
 
 Users who can tolerate *approximate rank* topN over a dimension with greater 
than 1000 unique values, but require *exact aggregates* can issue two queries. 
One to get the approximate topN dimension values, and another topN with 
dimension selection filters which only use the topN results of the first.
 
-#### Example First query:
+### Example First query
 
 ```json
 {
     "aggregations": [
-             {
-                 "fieldName": "L_QUANTITY_longSum",
-                 "name": "L_QUANTITY_",
-                 "type": "longSum"
-             }
+         {
+             "fieldName": "L_QUANTITY_longSum",
+             "name": "L_QUANTITY_",
+             "type": "longSum"
+         }
     ],
     "dataSource": "tpch_year",
     "dimension":"l_orderkey",
@@ -199,35 +199,35 @@ Users who can tolerate *approximate rank* topN over a 
dimension with greater tha
 }
 ```
 
-#### Example second query:
+### Example second query
 
 ```json
 {
     "aggregations": [
-             {
-                 "fieldName": "L_TAX_doubleSum",
-                 "name": "L_TAX_",
-                 "type": "doubleSum"
-             },
-             {
-                 "fieldName": "L_DISCOUNT_doubleSum",
-                 "name": "L_DISCOUNT_",
-                 "type": "doubleSum"
-             },
-             {
-                 "fieldName": "L_EXTENDEDPRICE_doubleSum",
-                 "name": "L_EXTENDEDPRICE_",
-                 "type": "doubleSum"
-             },
-             {
-                 "fieldName": "L_QUANTITY_longSum",
-                 "name": "L_QUANTITY_",
-                 "type": "longSum"
-             },
-             {
-                 "name": "count",
-                 "type": "count"
-             }
+         {
+             "fieldName": "L_TAX_doubleSum",
+             "name": "L_TAX_",
+             "type": "doubleSum"
+         },
+         {
+             "fieldName": "L_DISCOUNT_doubleSum",
+             "name": "L_DISCOUNT_",
+             "type": "doubleSum"
+         },
+         {
+             "fieldName": "L_EXTENDEDPRICE_doubleSum",
+             "name": "L_EXTENDEDPRICE_",
+             "type": "doubleSum"
+         },
+         {
+             "fieldName": "L_QUANTITY_longSum",
+             "name": "L_QUANTITY_",
+             "type": "longSum"
+         },
+         {
+             "name": "count",
+             "type": "count"
+         }
     ],
     "dataSource": "tpch_year",
     "dimension":"l_orderkey",
diff --git a/website/i18n/en.json b/website/i18n/en.json
index a2a1716..4abced1 100644
--- a/website/i18n/en.json
+++ b/website/i18n/en.json
@@ -50,6 +50,9 @@
       "design/coordinator": {
         "title": "Coordinator Process"
       },
+      "design/extensions-contrib/dropwizard": {
+        "title": "Dropwizard metrics emitter"
+      },
       "design/historical": {
         "title": "Historical Process"
       },
@@ -336,9 +339,6 @@
       "operations/pull-deps": {
         "title": "pull-deps tool"
       },
-      "operations/recommendations": {
-        "title": "Recommendations"
-      },
       "operations/reset-cluster": {
         "title": "reset-cluster tool"
       },


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-druid] branch 0.16.0-incubating updated: fix doc headers (#8729) (#8737)

Reply via email to