This is an automated email from the ASF dual-hosted git repository.
himanshug pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-druid.git
The following commit(s) were added to refs/heads/master by this push:
new c87b47e More documentation formatting fixes (#8149)
c87b47e is described below
commit c87b47e0fa5d5b93593fca43933f6389ebecef0d
Author: Magnus Henoch <[email protected]>
AuthorDate: Wed Jul 24 23:26:03 2019 +0100
More documentation formatting fixes (#8149)
Add empty lines before bulleted lists and code blocks, to ensure that
they show up properly on the web site. See also #8079.
---
.../extensions-core/approximate-histograms.md | 1 +
.../development/extensions-core/bloom-filter.md | 1 +
.../extensions-core/druid-basic-security.md | 2 ++
docs/content/ingestion/native_tasks.md | 1 +
docs/content/operations/basic-cluster-tuning.md | 2 ++
docs/content/operations/deep-storage-migration.md | 1 +
docs/content/operations/export-metadata.md | 22 ++++++++++++----------
docs/content/operations/metadata-migration.md | 1 +
docs/content/operations/recommendations.md | 1 +
docs/content/querying/aggregations.md | 1 +
docs/content/tutorials/cluster.md | 9 +++++++++
11 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/docs/content/development/extensions-core/approximate-histograms.md
b/docs/content/development/extensions-core/approximate-histograms.md
index 30b5f32..2e900d2 100644
--- a/docs/content/development/extensions-core/approximate-histograms.md
+++ b/docs/content/development/extensions-core/approximate-histograms.md
@@ -37,6 +37,7 @@ The Approximate Histogram aggregator is deprecated. Please
use <a href="../exten
This aggregator is based on
[http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf)
to compute approximate histograms, with the following modifications:
+
- some tradeoffs in accuracy were made in the interest of speed (see below)
- the sketch maintains the exact original data as long as the number of
distinct data points is fewer than the resolutions (number of centroids),
diff --git a/docs/content/development/extensions-core/bloom-filter.md
b/docs/content/development/extensions-core/bloom-filter.md
index 3d6749a..f0563a0 100644
--- a/docs/content/development/extensions-core/bloom-filter.md
+++ b/docs/content/development/extensions-core/bloom-filter.md
@@ -33,6 +33,7 @@ to use with Druid for cases where an explicit filter is
impossible, e.g. filteri
values.
Following are some characteristics of BloomFilters:
+
- BloomFilters are highly space efficient when compared to using a HashSet.
- Because of the probabilistic nature of bloom filters, false positive results
are possible (element was not actually
inserted into a bloom filter during construction, but `test()` says true)
diff --git a/docs/content/development/extensions-core/druid-basic-security.md
b/docs/content/development/extensions-core/druid-basic-security.md
index e067fdf..4189c3f 100644
--- a/docs/content/development/extensions-core/druid-basic-security.md
+++ b/docs/content/development/extensions-core/druid-basic-security.md
@@ -25,6 +25,7 @@ title: "Basic Security"
# Druid Basic Security
This Apache Druid (incubating) extension adds:
+
- an Authenticator which supports [HTTP Basic
authentication](https://en.wikipedia.org/wiki/Basic_access_authentication)
- an Authorizer which implements basic role-based access control
@@ -342,6 +343,7 @@ Unassign role {roleName} from user {userName}
Set the permissions of {roleName}. This replaces the previous set of
permissions on the role.
Content: List of JSON Resource-Action objects, e.g.:
+
```
[
{
diff --git a/docs/content/ingestion/native_tasks.md
b/docs/content/ingestion/native_tasks.md
index c5cd91b..1cf5e01 100644
--- a/docs/content/ingestion/native_tasks.md
+++ b/docs/content/ingestion/native_tasks.md
@@ -55,6 +55,7 @@ the implementation of splittable firehoses. Please note that
multiple tasks can
if one of them fails.
You may want to consider the below points:
+
- Since this task doesn't shuffle intermediate data, it isn't available for
[perfect rollup](../ingestion/index.html#roll-up-modes).
- The number of tasks for parallel ingestion is decided by `maxNumSubTasks` in
the tuningConfig.
Since the supervisor task creates up to `maxNumSubTasks` worker tasks
regardless of the available task slots,
diff --git a/docs/content/operations/basic-cluster-tuning.md
b/docs/content/operations/basic-cluster-tuning.md
index aa09c07..226cc92 100644
--- a/docs/content/operations/basic-cluster-tuning.md
+++ b/docs/content/operations/basic-cluster-tuning.md
@@ -37,6 +37,7 @@ If you have questions on tuning Druid for specific use cases,
or questions on co
#### Heap sizing
The biggest contributions to heap usage on Historicals are:
+
- Partial unmerged query results from segments
- The stored maps for [lookups](../querying/lookups.html).
@@ -63,6 +64,7 @@ Be sure to add `(2 * total size of all loaded lookups)` to
your heap size in add
Please see the [General Guidelines for Processing Threads and
Buffers](#general-guidelines-for-processing-threads-and-buffers) section for an
overview of processing thread/buffer configuration.
On Historicals:
+
- `druid.processing.numThreads` should generally be set to `(number of cores -
1)`: a smaller value can result in CPU underutilization, while going over the
number of cores can result in unnecessary CPU contention.
- `druid.processing.buffer.sizeBytes` can be set to 500MB.
- `druid.processing.numMergeBuffers`, a 1:4 ratio of merge buffers to
processing threads is a reasonable choice for general use.
diff --git a/docs/content/operations/deep-storage-migration.md
b/docs/content/operations/deep-storage-migration.md
index 3fc61e7..55180a0 100644
--- a/docs/content/operations/deep-storage-migration.md
+++ b/docs/content/operations/deep-storage-migration.md
@@ -28,6 +28,7 @@ If you have been running an evaluation Druid cluster using
local deep storage an
more production-capable deep storage system such as S3 or HDFS, this document
describes the necessary steps.
Migration of deep storage involves the following steps at a high level:
+
- Copying segments from local deep storage to the new deep storage
- Exporting Druid's segments table from metadata
- Rewriting the load specs in the exported segment data to reflect the new
deep storage location
diff --git a/docs/content/operations/export-metadata.md
b/docs/content/operations/export-metadata.md
index 11c0f76..d1119ee 100644
--- a/docs/content/operations/export-metadata.md
+++ b/docs/content/operations/export-metadata.md
@@ -27,6 +27,7 @@ title: "Export Metadata Tool"
Druid includes an `export-metadata` tool for assisting with migration of
cluster metadata and deep storage.
This tool exports the contents of the following Druid metadata tables:
+
- segments
- rules
- config
@@ -37,6 +38,7 @@ Additionally, the tool can rewrite the local deep storage
location descriptors i
to point to new deep storage locations (S3, HDFS, and local rewrite paths are
supported).
The tool has the following limitations:
+
- Only exporting from Derby metadata is currently supported
- If rewriting load specs for deep storage migration, only migrating from
local deep storage is currently supported.
@@ -46,20 +48,19 @@ The `export-metadata` tool provides the following options:
### Connection Properties
-`--connectURI`: The URI of the Derby database, e.g.
`jdbc:derby://localhost:1527/var/druid/metadata.db;create=true`
-`--user`: Username
-`--password`: Password
-`--base`: corresponds to the value of `druid.metadata.storage.tables.base` in
the configuration, `druid` by default.
+- `--connectURI`: The URI of the Derby database, e.g.
`jdbc:derby://localhost:1527/var/druid/metadata.db;create=true`
+- `--user`: Username
+- `--password`: Password
+- `--base`: corresponds to the value of `druid.metadata.storage.tables.base`
in the configuration, `druid` by default.
### Output Path
-`--output-path`, `-o`: The output directory of the tool. CSV files for the
Druid segments, rules, config, datasource, and supervisors tables will be
written to this directory.
+- `--output-path`, `-o`: The output directory of the tool. CSV files for the
Druid segments, rules, config, datasource, and supervisors tables will be
written to this directory.
### Export Format Options
-`--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal
strings. This needs to be set if importing back into Derby. Default is false.
-
-`--booleans-as-strings`, `-t`: If set, write boolean values as "true" or
"false" instead of "1" and "0". This needs to be set if importing back into
Derby. Default is false.
+- `--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal
strings. This needs to be set if importing back into Derby. Default is false.
+- `--booleans-as-strings`, `-t`: If set, write boolean values as "true" or
"false" instead of "1" and "0". This needs to be set if importing back into
Derby. Default is false.
### Deep Storage Migration
@@ -69,8 +70,8 @@ By setting the options below, the tool will rewrite the
segment load specs to po
This helps users migrate segments stored in local deep storage to S3.
-`--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments
-`--s3baseKey`, `-k`: The base S3 key where the migrated segments will be stored
+- `--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments
+- `--s3baseKey`, `-k`: The base S3 key where the migrated segments will be
stored
When copying the local deep storage segments to S3, the rewrite performed by
this tool requires that the directory structure of the segments be unchanged.
@@ -142,6 +143,7 @@ java -classpath "lib/*"
-Dlog4j.configurationFile=conf/druid/cluster/_common/log
```
In the example command above:
+
- `lib` is the the Druid lib directory
- `extensions` is the Druid extensions directory
- `/tmp/csv` is the output directory. Please make sure that this directory
exists.
diff --git a/docs/content/operations/metadata-migration.md
b/docs/content/operations/metadata-migration.md
index 95c05ef..9f6b419 100644
--- a/docs/content/operations/metadata-migration.md
+++ b/docs/content/operations/metadata-migration.md
@@ -61,6 +61,7 @@ Update your Druid runtime properties with the new metadata
configuration.
Druid provides a `metadata-init` tool for creating Druid's metadata tables.
After initializing the Druid database, you can run the commands shown below
from the root of the Druid package to initialize the tables.
In the example commands below:
+
- `lib` is the the Druid lib directory
- `extensions` is the Druid extensions directory
- `base` corresponds to the value of `druid.metadata.storage.tables.base` in
the configuration, `druid` by default.
diff --git a/docs/content/operations/recommendations.md
b/docs/content/operations/recommendations.md
index 61cb871..21100bb 100644
--- a/docs/content/operations/recommendations.md
+++ b/docs/content/operations/recommendations.md
@@ -59,6 +59,7 @@ JVM Flags:
Please note that above flags are general guidelines only. Be cautious and feel
free to change them if necessary for the specific deployment.
Additionally, for large jvm heaps, here are a few Garbage Collection
efficiency guidelines that have been known to help in some cases.
+
- Mount /tmp on tmpfs ( See http://www.evanjones.ca/jvm-mmap-pause.html )
- On Disk-IO intensive processes (e.g. Historical and MiddleManager), GC and
Druid logs should be written to a different disk than where data is written.
- Disable Transparent Huge Pages ( See
https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp
)
diff --git a/docs/content/querying/aggregations.md
b/docs/content/querying/aggregations.md
index ba9b80e..66acea0 100644
--- a/docs/content/querying/aggregations.md
+++ b/docs/content/querying/aggregations.md
@@ -337,6 +337,7 @@ The [Approximate
Histogram](../development/extensions-core/approximate-histogram
The algorithm used by this deprecated aggregator is highly
distribution-dependent and its output is subject to serious distortions when
the input does not fit within the algorithm's limitations.
A [study published by the DataSketches
team](https://datasketches.github.io/docs/Quantiles/DruidApproxHistogramStudy.html)
demonstrates some of the known failure modes of this algorithm:
+
- The algorithm's quantile calculations can fail to provide results for a
large range of rank values (all ranks less than 0.89 in the example used in the
study), returning all zeroes instead.
- The algorithm can completely fail to record spikes in the tail ends of the
distribution
- In general, the histogram produced by the algorithm can deviate
significantly from the true histogram, with no bounds on the errors.
diff --git a/docs/content/tutorials/cluster.md
b/docs/content/tutorials/cluster.md
index a202d25..f78cf5c 100644
--- a/docs/content/tutorials/cluster.md
+++ b/docs/content/tutorials/cluster.md
@@ -30,6 +30,7 @@ In this document, we'll set up a simple cluster and discuss
how it can be furthe
your needs.
This simple cluster will feature:
+
- A Master server to host the Coordinator and Overlord processes
- Two scalable, fault-tolerant Data servers running Historical and
MiddleManager processes
- A query server, hosting the Druid Broker and Router processes
@@ -49,6 +50,7 @@ The Coordinator and Overlord processes are responsible for
handling the metadata
In this example, we will be deploying the equivalent of one AWS
[m5.2xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance.
This hardware offers:
+
- 8 vCPUs
- 31 GB RAM
@@ -77,6 +79,7 @@ in-memory query cache. These servers benefit greatly from CPU
and RAM.
In this example, we will be deploying the equivalent of one AWS
[m5.2xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance.
This hardware offers:
+
- 8 vCPUs
- 31 GB RAM
@@ -323,6 +326,7 @@ You can copy your existing `coordinator-overlord` configs
from the single-server
Suppose we are migrating from a single-server deployment that had 32 CPU and
256GB RAM. In the old deployment, the following configurations for Historicals
and MiddleManagers were applied:
Historical (Single-server)
+
```
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=8
@@ -330,6 +334,7 @@ druid.processing.numThreads=31
```
MiddleManager (Single-server)
+
```
druid.worker.capacity=8
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
@@ -340,11 +345,13 @@ druid.indexer.fork.property.druid.processing.numThreads=1
In the clustered deployment, we can choose a split factor (2 in this example),
and deploy 2 Data servers with 16CPU and 128GB RAM each. The areas to scale are
the following:
Historical
+
- `druid.processing.numThreads`: Set to `(num_cores - 1)` based on the new
hardware
- `druid.processing.numMergeBuffers`: Divide the old value from the
single-server deployment by the split factor
- `druid.processing.buffer.sizeBytes`: Keep this unchanged
MiddleManager:
+
- `druid.worker.capacity`: Divide the old value from the single-server
deployment by the split factor
- `druid.indexer.fork.property.druid.processing.numMergeBuffers`: Keep this
unchanged
- `druid.indexer.fork.property.druid.processing.buffer.sizeBytes`: Keep this
unchanged
@@ -353,6 +360,7 @@ MiddleManager:
The resulting configs after the split:
New Historical (on 2 Data servers)
+
```
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=8
@@ -360,6 +368,7 @@ New Historical (on 2 Data servers)
```
New MiddleManager (on 2 Data servers)
+
```
druid.worker.capacity=4
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]