This is an automated email from the ASF dual-hosted git repository.
abhishekrb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 24056b90b57 Bring back missing property in indexer documentation
(#16582)
24056b90b57 is described below
commit 24056b90b579fe0ff4765391e5f7a1741d9515e0
Author: Andreas Maechler <[email protected]>
AuthorDate: Mon Jun 10 17:52:54 2024 -0600
Bring back missing property in indexer documentation (#16582)
* Bring back druid.peon.taskActionClient.retry.minWait
* Update docs/configuration/index.md
* Consistent italics
Thanks Abhishek.
* Update docs/configuration/index.md
Co-authored-by: Abhishek Radhakrishnan <[email protected]>
* Consistent list style
* Remove extra space
---------
Co-authored-by: Abhishek Radhakrishnan <[email protected]>
---
docs/configuration/index.md | 127 ++++++++++++++++++++++----------------------
1 file changed, 63 insertions(+), 64 deletions(-)
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 1976657c41e..4eceec8beec 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -22,14 +22,13 @@ title: "Configuration reference"
~ under the License.
-->
-
This page documents all of the configuration properties for each Druid service
type.
## Recommended configuration file organization
A recommended way of organizing Druid configuration files can be seen in the
`conf` directory in the Druid package root, shown below:
-```
+```sh
$ ls -R conf
druid
@@ -65,7 +64,7 @@ Common properties shared by all services are placed in
`_common/common.runtime.p
Configuration values can be interpolated from System Properties, Environment
Variables, or local files. Below is an example of how this can be used:
-```
+```properties
druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE}
druid.processing.tmpDir=${sys:java.io.tmpdir}
druid.segmentCache.locations=${file:UTF-8:/config/segment-cache-def.json}
@@ -73,20 +72,20 @@
druid.segmentCache.locations=${file:UTF-8:/config/segment-cache-def.json}
Interpolation is also recursive so you can do:
-```
+```properties
druid.segmentCache.locations=${file:UTF-8:${env:SEGMENT_DEF_LOCATION}}
```
If the property is not set, an exception will be thrown on startup, but a
default can be provided if desired. Setting a default value will not work with
file interpolation as an exception will be thrown if the file does not exist.
-```
+```properties
druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE:-mysql}
druid.processing.tmpDir=${sys:java.io.tmpdir:-/tmp}
```
If you need to set a variable that is wrapped by `${...}` but do not want it
to be interpolated, you can escape it by adding another `$`. For example:
-```
+```properties
config.name=$${value}
```
@@ -98,16 +97,16 @@ The properties under this section are common configurations
that should be share
There are four JVM parameters that we set on all of our services:
-- `-Duser.timezone=UTC`: This sets the default timezone of the JVM to UTC. We
always set this and do not test with other default timezones, so local
timezones might work, but they also might uncover weird and interesting bugs.
To issue queries in a non-UTC timezone, see [query
granularities](../querying/granularities.md#period-granularities)
-- `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming
UTF-8. Local encodings might work, but they also might result in weird and
interesting bugs.
-- `-Djava.io.tmpdir=<a path>` Various parts of Druid use temporary files to
interact with the file system. These files can become quite large. This means
that systems that have small `/tmp` directories can cause problems for Druid.
Therefore, set the JVM tmp directory to a location with ample space.
+* `-Duser.timezone=UTC`: This sets the default timezone of the JVM to UTC. We
always set this and do not test with other default timezones, so local
timezones might work, but they also might uncover weird and interesting bugs.
To issue queries in a non-UTC timezone, see [query
granularities](../querying/granularities.md#period-granularities)
+* `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming UTF-8.
Local encodings might work, but they also might result in weird and interesting
bugs.
+* `-Djava.io.tmpdir=<a path>` Various parts of Druid use temporary files to
interact with the file system. These files can become quite large. This means
that systems that have small `/tmp` directories can cause problems for Druid.
Therefore, set the JVM tmp directory to a location with ample space.
Also consider the following when configuring the JVM tmp directory:
- - The temp directory should not be volatile tmpfs.
- - This directory should also have good read and write speed.
- - Avoid NFS mount.
- - The `org.apache.druid.java.util.metrics.SysMonitor` requires execute
privileges on files in `java.io.tmpdir`. If you are using the system monitor,
do not set `java.io.tmpdir` to `noexec`.
-- `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` This
allows log4j2 to handle logs for non-log4j2 components (like jetty) which use
standard java logging.
+ * The temp directory should not be volatile tmpfs.
+ * This directory should also have good read and write speed.
+ * Avoid NFS mount.
+ * The `org.apache.druid.java.util.metrics.SysMonitor` requires execute
privileges on files in `java.io.tmpdir`. If you are using the system monitor,
do not set `java.io.tmpdir` to `noexec`.
+* `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` This
allows log4j2 to handle logs for non-log4j2 components (like jetty) which use
standard java logging.
### Extensions
@@ -285,13 +284,13 @@ The format of request logs is TSV, one line per requests,
with five fields: time
For native JSON request, the `sql_query` field is empty. For example:
-```
+```txt
2019-01-14T10:00:00.000Z 127.0.0.1
{"queryType":"topN","dataSource":{"type":"table","name":"wikiticker"},"virtualColumns":[],"dimension":{"type":"LegacyDimensionSpec","dimension":"page","outputName":"page","outputType":"STRING"},"metric":{"type":"LegacyTopNMetricSpec","metric":"count"},"threshold":10,"intervals":{"type":"LegacySegmentSpec","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count"
[...]
```
For SQL query request, the `native_query` field is empty. For example:
-```
+```txt
2019-01-14T10:00:00.000Z 127.0.0.1 {"sqlQuery/time":100,
"sqlQuery/planningTimeMs":10, "sqlQuery/bytes":600, "success":true,
"identity":"user1"} {"query":"SELECT page, COUNT(*) AS Edits FROM wikiticker
WHERE TIME_IN_INTERVAL(\"__time\", '2015-09-12/2015-09-13') GROUP BY page ORDER
BY Edits DESC LIMIT
10","context":{"sqlQueryId":"c9d035a0-5ffd-4a79-a865-3ffdadbb5fdd","nativeQueryIds":"[490978e4-f5c7-4cf6-b174-346e63cf8863]"}}
```
@@ -401,7 +400,7 @@ Metric monitoring is an essential part of Druid operations.
The following monito
|`org.apache.druid.server.metrics.SegmentStatsMonitor` | **EXPERIMENTAL**
Reports statistics about segments on Historical services. Available only on
Historical services. Not to be used when lazy loading is configured.|
|`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many
queries have been successful/failed/interrupted.|
|`org.apache.druid.server.metrics.SubqueryCountStatsMonitor`|Reports how many
subqueries have been materialized as rows or bytes and various other statistics
related to the subquery execution|
-|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal
metrics of `http` or `parametrized` emitter (see below). Must not be used with
another emitter type. See the description of the metrics here:
https://github.com/apache/druid/pull/4973.|
+|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal
metrics of `http` or `parametrized` emitter (see below). Must not be used with
another emitter type. See the description of the metrics here:
<https://github.com/apache/druid/pull/4973>.|
|`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many
ingestion tasks are currently running/pending/waiting and also the number of
successful/failed tasks per emission period.|
|`org.apache.druid.server.metrics.TaskSlotCountStatsMonitor`|Reports metrics
about task slot usage per emission period.|
|`org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor`|Reports how
many ingestion tasks are currently running/pending/waiting, the number of
successful/failed tasks, and metrics about task slot usage for the reporting
worker, per emission period. Only supported by MiddleManager node types.|
@@ -409,7 +408,7 @@ Metric monitoring is an essential part of Druid operations.
The following monito
For example, you might configure monitors on all services for system and JVM
information within `common.runtime.properties` as follows:
-```
+```properties
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor","org.apache.druid.java.util.metrics.JvmMonitor"]
```
@@ -419,13 +418,13 @@ You can override cluster-wide configuration by amending
the `runtime.properties`
There are several emitters available:
-- `noop` (default) disables metric emission.
-- [`logging`](#logging-emitter-module) emits logs using Log4j2.
-- [`http`](#http-emitter-module) sends `POST` requests of JSON events.
-- [`parametrized`](#parametrized-http-emitter-module) operates like the `http`
emitter but fine-tunes the recipient URL based on the event feed.
-- [`composing`](#composing-emitter-module) initializes multiple emitter
modules.
-- [`graphite`](#graphite-emitter) emits metrics to a
[Graphite](https://graphiteapp.org/) Carbon service.
-- [`switching`](#switching-emitter) initializes and emits to multiple emitter
modules based on the event feed.
+* `noop` (default) disables metric emission.
+* [`logging`](#logging-emitter-module) emits logs using Log4j2.
+* [`http`](#http-emitter-module) sends `POST` requests of JSON events.
+* [`parametrized`](#parametrized-http-emitter-module) operates like the `http`
emitter but fine-tunes the recipient URL based on the event feed.
+* [`composing`](#composing-emitter-module) initializes multiple emitter
modules.
+* [`graphite`](#graphite-emitter) emits metrics to a
[Graphite](https://graphiteapp.org/) Carbon service.
+* [`switching`](#switching-emitter) initializes and emits to multiple emitter
modules based on the event feed.
##### Logging emitter module
@@ -474,6 +473,7 @@ The following properties allow the HTTP Emitter to use its
own truststore config
The parametrized emitter takes the same configs as the [`http`
emitter](#http-emitter-module) using the prefix
`druid.emitter.parametrized.httpEmitting.`.
For example:
+
* `druid.emitter.parametrized.httpEmitting.flushMillis`
* `druid.emitter.parametrized.httpEmitting.flushCount`
* `druid.emitter.parametrized.httpEmitting.ssl.trustStorePath`
@@ -557,7 +557,7 @@ The below table shows some important configurations for S3.
See [S3 Deep Storage
|`druid.storage.bucket`|S3 bucket name.|none|
|`druid.storage.baseKey`|S3 object key prefix for storage.|none|
|`druid.storage.disableAcl`|Boolean flag for ACL. If this is set to `false`,
the full control would be granted to the bucket owner. This may require to set
additional permissions. See [S3 permissions
settings](../development/extensions-core/s3.md#s3-permissions-settings).|false|
-|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the
*archive task*.|none|
+|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the
_archive task_.|none|
|`druid.storage.archiveBaseKey`|S3 object key prefix for archiving.|none|
|`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`,
`kms`, and `custom`. See the below [Server-side encryption
section](../development/extensions-core/s3.md#server-side-encryption) for more
details.|None|
|`druid.storage.sse.kms.keyId`|AWS KMS key ID. This is used only when
`druid.storage.sse.type` is `kms` and can be empty to use the default key
ID.|None|
@@ -581,7 +581,6 @@ This deep storage is used to interface with Cassandra. You
must load the `druid-
|`druid.storage.host`|Cassandra host.|none|
|`druid.storage.keyspace`|Cassandra key space.|none|
-
#### Centralized datasource schema
Centralized datasource schema is an [experimental
feature](../development/experimental.md) to centralized datasource schema
building within the Coordinator.
@@ -610,7 +609,6 @@ the [HDFS input
source](../ingestion/input-sources.md#hdfs-input-source).
|--------|---------------|-----------|-------|
|`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols
for the HDFS input source and HDFS firehose.|`["hdfs"]`|
-
#### HTTP input source
You can set the following property to specify permissible protocols for
@@ -620,15 +618,15 @@ the [HTTP input
source](../ingestion/input-sources.md#http-input-source).
|--------|---------------|-----------|-------|
|`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols
for the HTTP input source and HTTP firehose.|`["http", "https"]`|
-
### External data access security configuration
#### JDBC connections to external databases
You can use the following properties to specify permissible JDBC options for:
-- [SQL input source](../ingestion/input-sources.md#sql-input-source)
-- [globally cached JDBC
lookups](../querying/lookups-cached-global.md#jdbc-lookup)
-- [JDBC Data Fetcher for per-lookup
caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).
+
+* [SQL input source](../ingestion/input-sources.md#sql-input-source)
+* [globally cached JDBC
lookups](../querying/lookups-cached-global.md#jdbc-lookup)
+* [JDBC Data Fetcher for per-lookup
caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).
These properties do not apply to metadata storage connections.
@@ -720,9 +718,10 @@ You can configure Druid API error responses to hide
internal information like th
You can use an error response transform strategy to transform error responses
from within Druid services to hide internal information.
When you specify an error response transform strategy other than `none`, Druid
transforms the error responses from Druid services as follows:
- - For any query API that fails in the Router service, Druid sets the fields
`errorClass` and `host` to null. Druid applies the transformation strategy to
the `errorMessage` field.
- - For any SQL query API that fails, for example `POST /druid/v2/sql/...`,
Druid sets the fields `errorClass` and `host` to null. Druid applies the
transformation strategy to the `errorMessage` field.
- - For any JDBC related exceptions, Druid will turn all checked exceptions
into `QueryInterruptedException` otherwise druid will attempt to keep the
exception as the same type. For example if the original exception isn't owned
by Druid it will become `QueryInterruptedException`. Druid applies the
transformation strategy to the `errorMessage` field.
+
+* For any query API that fails in the Router service, Druid sets the fields
`errorClass` and `host` to null. Druid applies the transformation strategy to
the `errorMessage` field.
+* For any SQL query API that fails, for example `POST /druid/v2/sql/...`,
Druid sets the fields `errorClass` and `host` to null. Druid applies the
transformation strategy to the `errorMessage` field.
+* For any JDBC related exceptions, Druid will turn all checked exceptions into
`QueryInterruptedException` otherwise druid will attempt to keep the exception
as the same type. For example if the original exception isn't owned by Druid it
will become `QueryInterruptedException`. Druid applies the transformation
strategy to the `errorMessage` field.
###### No error response transform strategy
@@ -739,19 +738,19 @@ In this mode, Druid validates the error responses from
underlying services again
For example, consider the following error response:
-```
+```json
{"error":"Plan validation
failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException:
From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource'
not
found","errorClass":"org.apache.calcite.tools.ValidationException","host":null}
```
If `druid.server.http.errorResponseTransform.allowedRegex` is set to `[]`,
Druid transforms the query error response to the following:
-```
+```json
{"error":"Plan validation
failed","errorMessage":null,"errorClass":null,"host":null}
```
On the other hand, if `druid.server.http.errorResponseTransform.allowedRegex`
is set to `[".*CalciteContextException.*"]` then Druid transforms the query
error response to the following:
-```
+```json
{"error":"Plan validation
failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException:
From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource'
not found","errorClass":null,"host":null}
```
@@ -820,6 +819,7 @@ Support for 64-bit floating point columns was released in
Druid 0.11.0, so if yo
|`druid.indexing.doubleStorage`|Set to "float" to use 32-bit double
representation for double columns.|double|
### SQL compatible null handling
+
These configurations are deprecated and will be removed in a future release at
which point Druid will always have SQl compatible null handling.
Prior to version 0.13.0, Druid string columns treated `''` and `null` values
as interchangeable, and numeric columns were unable to represent `null` values,
coercing `null` to `0`. Druid 0.13.0 introduced a mode which enabled SQL
compatible null handling, allowing string columns to distinguish empty strings
from nulls, and numeric columns to contain null rows.
@@ -1118,7 +1118,7 @@ These Overlord static configurations can be defined in
the `overlord/runtime.pro
|`druid.indexer.runner.type`|Indicates whether tasks should be run locally
using `local` or in a distributed environment using `remote`. The recommended
option is `httpRemote`, which is similar to `remote` but uses HTTP to interact
with Middle Managers instead of ZooKeeper.|`httpRemote`|
|`druid.indexer.storage.type`|Indicates whether incoming tasks should be
stored locally (in heap) or in metadata storage. One of `local` or `metadata`.
`local` is mainly for internal testing while `metadata` is recommended in
production because storing incoming tasks in metadata storage allows for tasks
to be resumed if the Overlord should fail.|`local`|
|`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store
task results. Default is 24 hours. If you have hundreds of tasks running in a
day, consider increasing this threshold.|`PT24H`|
-|`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still
experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If
not set, each task automatically chooses a lock type to use. This configuration
can be overwritten by setting `forceTimeChunkLock` in the [task
context](../ingestion/tasks.md#context). See [Task Locking &
Priority](../ingestion/tasks.md#context) for more details about locking in
tasks.|true|
+|`druid.indexer.tasklock.forceTimeChunkLock`|**Setting this to false is still
experimental**<br/> If set, all tasks are enforced to use time chunk lock. If
not set, each task automatically chooses a lock type to use. This configuration
can be overwritten by setting `forceTimeChunkLock` in the [task
context](../ingestion/tasks.md#context). See [Task Locking &
Priority](../ingestion/tasks.md#context) for more details about locking in
tasks.|true|
|`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, Druid
performs segment allocate actions in batches to improve throughput and reduce
the average `task/action/run/time`. See [batching `segmentAllocate`
actions](../ingestion/tasks.md#batching-segmentallocate-actions) for
details.|true|
|`druid.indexer.tasklock.batchAllocationWaitTime`|Number of milliseconds after
Druid adds the first segment allocate action to a batch, until it executes the
batch. Allows the batch to add more requests and improve the average segment
allocation run time. This configuration takes effect only if
`batchSegmentAllocation` is enabled.|0|
|`druid.indexer.task.default.context`|Default task context that is applied to
all tasks submitted to the Overlord. Any default in this config does not
override neither the context values the user provides nor
`druid.indexer.tasklock.forceTimeChunkLock`.|empty context|
@@ -1239,29 +1239,29 @@ either an `affinityConfig` or a `categorySpec`. Then,
Druid assigns the task by
(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
There are 4 options for select strategies:
-- [`equalDistribution`](#equaldistribution)
-- [`equalDistributionWithCategorySpec`](#equaldistributionwithcategoryspec)
-- [`fillCapacity`](#fillcapacity)
-- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+* [`equalDistribution`](#equaldistribution)
+* [`equalDistributionWithCategorySpec`](#equaldistributionwithcategoryspec)
+* [`fillCapacity`](#fillcapacity)
+* [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
A `javascript` option is also available but should only be used for
prototyping new strategies.
If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
-- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
-- a non-affinity worker if preferred workers are not available and the
affinity is _weak_ i.e. `strong: false`.
-- a preferred worker listed in the `affinityConfig` for this datasource if it
has available capacity
-- no worker if preferred workers are not available and affinity is _strong_
i.e. `strong: true`. In this case, the task remains in "pending" state. The
chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total
number of pending tasks to determine if a new node should be provisioned.
+* a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+* a non-affinity worker if preferred workers are not available and the
affinity is _weak_ i.e. `strong: false`.
+* a preferred worker listed in the `affinityConfig` for this datasource if it
has available capacity
+* no worker if preferred workers are not available and affinity is _strong_
i.e. `strong: true`. In this case, the task remains in "pending" state. The
chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total
number of pending tasks to determine if a new node should be provisioned.
Note that every worker listed in the `affinityConfig` will only be used for
the assigned datasources and no other.
If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and
`equalDistributionWithCategorySpec` strategies), then a task of a given
datasource may be assigned to:
-- any worker if no category config is given for task type
-- any worker if category config is given for task type but no category is
given for datasource and there's no default category
-- a preferred worker (based on category config and category for datasource) if
available
-- any worker if category config and category are given but no preferred worker
is available and category config is `weak`
-- not assigned at all if preferred workers are not available and category
config is `strong`
+* any worker if no category config is given for task type
+* any worker if category config is given for task type but no category is
given for datasource and there's no default category
+* a preferred worker (based on category config and category for datasource) if
available
+* any worker if category config and category are given but no preferred worker
is available and category config is `weak`
+* not assigned at all if preferred workers are not available and category
config is `strong`
In both the cases, Druid determines the list of eligible workers and selects
one depending on their load with the goal of either distributing the load
equally or filling as few workers as possible.
@@ -1299,7 +1299,7 @@ The following example shows tasks of type `index_kafka`
that default to running
"strong": false,
"categoryMap": {
"index_kafka": {
- "defaultCategory": "c1",
+ "defaultCategory": "c1",
"categoryAffinity": {
"ds1": "c2"
}
@@ -1437,7 +1437,7 @@ MiddleManagers pass their configurations down to their
child peons. The MiddleMa
|`druid.indexer.runner.compressZnodes`|Indicates whether or not the
MiddleManagers should compress Znodes.|true|
|`druid.indexer.runner.classpath`|Java classpath for the
peon.|`System.getProperty("java.class.path")`|
|`druid.indexer.runner.javaCommand`|Command required to execute java.|java|
-|`druid.indexer.runner.javaOpts`|*DEPRECATED* A string of -X Java options to
pass to the peon's JVM. Quotable parameters or parameters with spaces are
encouraged to use javaOptsArray|`''`|
+|`druid.indexer.runner.javaOpts`|_DEPRECATED_ A string of -X Java options to
pass to the peon's JVM. Quotable parameters or parameters with spaces are
encouraged to use javaOptsArray|`''`|
|`druid.indexer.runner.javaOptsArray`|A JSON array of strings to be passed in
as options to the peon's JVM. This is additive to
`druid.indexer.runner.javaOpts` and is recommended for properly handling
arguments which contain quotes or spaces like `["-XX:OnOutOfMemoryError=kill -9
%p"]`|`[]`|
|`druid.indexer.runner.maxZnodeBytes`|The maximum size Znode in bytes that can
be created in ZooKeeper, should be in the range of [10KiB, 2GiB).
[Human-readable format](human-readable-byte.md) is supported.|512KiB|
|`druid.indexer.runner.startPort`|Starting port used for Peon services, should
be greater than 1023 and less than 65536.|8100|
@@ -1449,7 +1449,7 @@ MiddleManagers pass their configurations down to their
child peons. The MiddleMa
|`druid.worker.baseTaskDirs`|List of base temporary working directories, one
of which is assigned per task in a round-robin fashion. This property can be
used to allow usage of multiple disks for indexing. This property is
recommended in place of and takes precedence over
`${druid.indexer.task.baseTaskDir}`. If this configuration is not set,
`${druid.indexer.task.baseTaskDir}` is used. For example,
`druid.worker.baseTaskDirs=[\"PATH1\",\"PATH2\",...]`.|null|
|`druid.worker.baseTaskDirSize`|The total amount of bytes that can be used by
tasks on any single task dir. This value is treated symmetrically across all
directories, that is, if this is 500 GB and there are 3 `baseTaskDirs`, then
each of those task directories is assumed to allow for 500 GB to be used and a
total of 1.5 TB will potentially be available across all tasks. The actual
amount of memory assigned to each task is discussed in [Configuring task
storage sizes](../ingestion/tasks [...]
|`druid.worker.category`|A string to name the category that the MiddleManager
node belongs to.|`_default_worker_category`|
-|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This
config should be set when [Centralized Datasource
Schema](#centralized-datasource-schema) feature is enabled. |false|
+|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This
config should be set when [Centralized Datasource
Schema](#centralized-datasource-schema) feature is enabled. |false|
#### Peon processing
@@ -1488,11 +1488,11 @@ You can optionally configure caching to be enabled on
the peons by setting cachi
See [cache configuration](#cache-configuration) for how to configure cache
settings.
-
#### Additional Peon configuration
+
Although Peons inherit the configurations of their parent MiddleManagers,
explicit child Peon configs in MiddleManager can be set by prefixing them with:
-```
+```properties
druid.indexer.fork.property
```
@@ -1525,14 +1525,14 @@ If the Peon is running in remote mode, there must be an
Overlord up and running.
##### SegmentWriteOutMediumFactory
-When new segments are created, Druid temporarily stores some preprocessed data
in some buffers.
+When new segments are created, Druid temporarily stores some preprocessed data
in some buffers.
The following types of medium exist for the buffers:
* **Temporary files** (`tmpFile`) are stored under the task working directory
(see `druid.worker.baseTaskDirs` configuration above) and thus share it's
mounting properties. For example, they could be backed by HDD, SSD or memory
(tmpfs).
This type of medium may do unnecessary disk I/O and requires some disk space
to be available.
* **Off-heap memory** (`offHeapMemory`) creates buffers in off-heap memory of
a JVM process that is running a task.
-This type of medium is preferred, but it may require to allow the JVM to have
more off-heap memory, by changing `-XX:MaxDirectMemorySize` configuration. It
is not yet understood how does the required off-heap memory size relates to the
size of the segments being created. But definitely it doesn't make sense to add
more extra off-heap memory, than the configured maximum *heap* size (`-Xmx`)
for the same JVM.
+This type of medium is preferred, but it may require to allow the JVM to have
more off-heap memory, by changing `-XX:MaxDirectMemorySize` configuration. It
is not yet understood how does the required off-heap memory size relates to the
size of the segments being created. But definitely it doesn't make sense to add
more extra off-heap memory, than the configured maximum _heap_ size (`-Xmx`)
for the same JVM.
* **On-heap memory** (`onHeapMemory`) creates buffers using the allocated heap
memory of the JVM process running a task. Using on-heap memory introduces
garbage collection overhead and so is not recommended in most cases. This type
of medium is most helpful for tasks run on external clusters where it may be
difficult to allocate and work with direct memory effectively.
@@ -1571,7 +1571,8 @@ For most types of tasks, `SegmentWriteOutMediumFactory`
can be configured per-ta
|`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop
tasks.|`/tmp/druid-indexing`|
|`druid.indexer.task.restoreTasksOnRestart`|If true, the Indexer will attempt
to stop tasks gracefully on shutdown and restore them on restart.|false|
|`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks
using the [Druid input source](../ingestion/input-sources.md) will ignore the
provided timestampSpec, and will use the `__time` column of the input
datasource. This option is provided for compatibility with ingestion specs
written before Druid 0.22.0.|false|
-|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to
store empty columns during ingestion. When set to true, Druid stores every
column specified in the
[`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). <br/><br/>If
you set `storeEmptyColumns` to false, Druid SQL queries referencing empty
columns will fail. If you intend to leave `storeEmptyColumns` disabled, you
should either ingest placeholder data for empty columns or else not query on
empty colu [...]
+|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to
store empty columns during ingestion. When set to true, Druid stores every
column specified in the
[`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). <br/><br/>If
you set `storeEmptyColumns` to false, Druid SQL queries referencing empty
columns will fail. If you intend to leave `storeEmptyColumns` disabled, you
should either ingest placeholder data for empty columns or else not query on
empty colu [...]
+|`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to
communicate with Overlord.|`PT5S`|
|`druid.peon.taskActionClient.retry.maxWait`|The maximum retry time to
communicate with Overlord.|`PT1M`|
|`druid.peon.taskActionClient.retry.maxRetryCount`|The maximum number of
retries to communicate with Overlord.|60|
@@ -1955,7 +1956,7 @@ The Druid SQL server is configured through the following
properties on the Broke
|`druid.sql.planner.useApproximateCountDistinct`|Whether to use an approximate
cardinality algorithm for `COUNT(DISTINCT foo)`.|true|
|`druid.sql.planner.useGroupingSetForExactDistinct`|Only relevant when
`useApproximateCountDistinct` is disabled. If set to true, exact distinct
queries are re-written using grouping sets. Otherwise, exact distinct queries
are re-written using joins. This should be set to true for group by query with
multiple exact distinct aggregations. This flag can be overridden per
query.|false|
|`druid.sql.planner.useApproximateTopN`|Whether to use approximate [TopN
queries](../querying/topnquery.md) when a SQL query could be expressed as such.
If false, exact [GroupBy queries](../querying/groupbyquery.md) will be used
instead.|true|
-|`druid.sql.planner.requireTimeCondition`|Whether to require SQL to have
filter conditions on __time column so that all generated native queries will
have user specified intervals. If true, all queries without filter condition on
__time column will fail|false|
+|`druid.sql.planner.requireTimeCondition`|Whether to require SQL to have
filter conditions on `__time` column so that all generated native queries will
have user specified intervals. If true, all queries without filter condition on
`__time` column will fail|false|
|`druid.sql.planner.sqlTimeZone`|Sets the default time zone for the server,
which will affect how time functions and timestamp literals behave. Should be a
time zone name like "America/Los_Angeles" or offset like "-08:00".|UTC|
|`druid.sql.planner.metadataSegmentCacheEnable`|Whether to keep a cache of
published segments in broker. If true, broker polls coordinator in background
to get segments from metadata store and maintains a local cache. If false,
coordinator's REST API will be invoked when broker needs published segments
info.|false|
|`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator
for published segments list if `druid.sql.planner.metadataSegmentCacheEnable`
is set to true. Poll period is in milliseconds. |60000|
@@ -2017,7 +2018,6 @@ Use the `druid.cache.type` configuration to set a
different kind of cache.
Cache settings are set globally, so the same configuration can be re-used
for both Broker and Historical processes, when defined in the common
properties file.
-
### Cache type
|Property|Possible Values|Description|Default|
@@ -2090,7 +2090,7 @@ Uses memcached as cache backend. This allows all
processes to share the same cac
|`druid.cache.locator`| Memcached locator. Can be consistent or
`array_mod`.|consistent|
|`druid.cache.enableTls`|Enable TLS based connection for Memcached client.
Boolean.|false|
|`druid.cache.clientMode`|Client Mode. Static mode requires the user to
specify individual cluster nodes. Dynamic mode uses
[AutoDiscovery](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.HowAutoDiscoveryWorks.html)
feature of AWS Memcached. String.
["static"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Manual.html)
or
["dynamic"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Using.ModifyApp.Java.html)|static|
-|`druid.cache.skipTlsHostnameVerification`|Skip TLS Hostname Verification.
Boolean.|true|
+|`druid.cache.skipTlsHostnameVerification`|Skip TLS Hostname Verification.
Boolean.|true|
#### Hybrid
@@ -2181,7 +2181,6 @@ Supported query contexts:
|`maxMergingDictionarySize`|Can be used to lower the value of
`druid.query.groupBy.maxMergingDictionarySize` for this query.|
|`maxOnDiskStorage`|Can be used to set `maxOnDiskStorage` to a value between 0
and `druid.query.groupBy.maxOnDiskStorage` for this query. If this query
context override exceeds `druid.query.groupBy.maxOnDiskStorage`, the query will
use `druid.query.groupBy.maxOnDiskStorage`. Omitting this from the query
context will cause the query to use `druid.query.groupBy.defaultOnDiskStorage`
for `maxOnDiskStorage`|
-
### Advanced configurations
Supported runtime properties:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]