(druid) branch master updated: Bring back missing property in indexer documentation (#16582)

abhishekrb Mon, 10 Jun 2024 16:53:37 -0700

This is an automated email from the ASF dual-hosted git repository.

abhishekrb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 24056b90b57 Bring back missing property in indexer documentation 
(#16582)
24056b90b57 is described below

commit 24056b90b579fe0ff4765391e5f7a1741d9515e0
Author: Andreas Maechler <[email protected]>
AuthorDate: Mon Jun 10 17:52:54 2024 -0600

    Bring back missing property in indexer documentation (#16582)
    
    * Bring back druid.peon.taskActionClient.retry.minWait
    
    * Update docs/configuration/index.md
    
    * Consistent italics
    
    Thanks Abhishek.
    
    * Update docs/configuration/index.md
    
    Co-authored-by: Abhishek Radhakrishnan <[email protected]>
    
    * Consistent list style
    
    * Remove extra space
    
    ---------
    
    Co-authored-by: Abhishek Radhakrishnan <[email protected]>
---
 docs/configuration/index.md | 127 ++++++++++++++++++++++----------------------
 1 file changed, 63 insertions(+), 64 deletions(-)

diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index 1976657c41e..4eceec8beec 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -22,14 +22,13 @@ title: "Configuration reference"
   ~ under the License.
   -->
 
-
 This page documents all of the configuration properties for each Druid service 
type.
 
 ## Recommended configuration file organization
 
 A recommended way of organizing Druid configuration files can be seen in the 
`conf` directory in the Druid package root, shown below:
 
-```
+```sh
 $ ls -R conf
 druid
 
@@ -65,7 +64,7 @@ Common properties shared by all services are placed in 
`_common/common.runtime.p
 
 Configuration values can be interpolated from System Properties, Environment 
Variables, or local files. Below is an example of how this can be used:
 
-```
+```properties
 druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE}
 druid.processing.tmpDir=${sys:java.io.tmpdir}
 druid.segmentCache.locations=${file:UTF-8:/config/segment-cache-def.json}
@@ -73,20 +72,20 @@ 
druid.segmentCache.locations=${file:UTF-8:/config/segment-cache-def.json}
 
 Interpolation is also recursive so you can do:
 
-```
+```properties
 druid.segmentCache.locations=${file:UTF-8:${env:SEGMENT_DEF_LOCATION}}
 ```
 
 If the property is not set, an exception will be thrown on startup, but a 
default can be provided if desired. Setting a default value will not work with 
file interpolation as an exception will be thrown if the file does not exist.
 
-```
+```properties
 druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE:-mysql}
 druid.processing.tmpDir=${sys:java.io.tmpdir:-/tmp}
 ```
 
 If you need to set a variable that is wrapped by `${...}` but do not want it 
to be interpolated, you can escape it by adding another `$`. For example:
 
-```
+```properties
 config.name=$${value}
 ```
 
@@ -98,16 +97,16 @@ The properties under this section are common configurations 
that should be share
 
 There are four JVM parameters that we set on all of our services:
 
--  `-Duser.timezone=UTC`: This sets the default timezone of the JVM to UTC. We 
always set this and do not test with other default timezones, so local 
timezones might work, but they also might uncover weird and interesting bugs. 
To issue queries in a non-UTC timezone, see [query 
granularities](../querying/granularities.md#period-granularities)
--  `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming 
UTF-8. Local encodings might work, but they also might result in weird and 
interesting bugs.
--  `-Djava.io.tmpdir=<a path>` Various parts of Druid use temporary files to 
interact with the file system. These files can become quite large. This means 
that systems that have small `/tmp` directories can cause problems for Druid. 
Therefore, set the JVM tmp directory to a location with ample space.
+* `-Duser.timezone=UTC`: This sets the default timezone of the JVM to UTC. We 
always set this and do not test with other default timezones, so local 
timezones might work, but they also might uncover weird and interesting bugs. 
To issue queries in a non-UTC timezone, see [query 
granularities](../querying/granularities.md#period-granularities)
+* `-Dfile.encoding=UTF-8` This is similar to timezone, we test assuming UTF-8. 
Local encodings might work, but they also might result in weird and interesting 
bugs.
+* `-Djava.io.tmpdir=<a path>` Various parts of Druid use temporary files to 
interact with the file system. These files can become quite large. This means 
that systems that have small `/tmp` directories can cause problems for Druid. 
Therefore, set the JVM tmp directory to a location with ample space.
 
      Also consider the following when configuring the JVM tmp directory:
-     - The temp directory should not be volatile tmpfs.
-     - This directory should also have good read and write speed.
-     - Avoid NFS mount.
-     - The `org.apache.druid.java.util.metrics.SysMonitor` requires execute 
privileges on files in `java.io.tmpdir`. If you are using the system monitor, 
do not set `java.io.tmpdir` to `noexec`.
--  `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` This 
allows log4j2 to handle logs for non-log4j2 components (like jetty) which use 
standard java logging.
+  * The temp directory should not be volatile tmpfs.
+  * This directory should also have good read and write speed.
+  * Avoid NFS mount.
+  * The `org.apache.druid.java.util.metrics.SysMonitor` requires execute 
privileges on files in `java.io.tmpdir`. If you are using the system monitor, 
do not set `java.io.tmpdir` to `noexec`.
+* `-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager` This 
allows log4j2 to handle logs for non-log4j2 components (like jetty) which use 
standard java logging.
 
 ### Extensions
 
@@ -285,13 +284,13 @@ The format of request logs is TSV, one line per requests, 
with five fields: time
 
 For native JSON request, the `sql_query` field is empty. For example:
 
-```
+```txt
 2019-01-14T10:00:00.000Z        127.0.0.1   
{"queryType":"topN","dataSource":{"type":"table","name":"wikiticker"},"virtualColumns":[],"dimension":{"type":"LegacyDimensionSpec","dimension":"page","outputName":"page","outputType":"STRING"},"metric":{"type":"LegacyTopNMetricSpec","metric":"count"},"threshold":10,"intervals":{"type":"LegacySegmentSpec","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count"
 [...]
 ```
 
 For SQL query request, the `native_query` field is empty. For example:
 
-```
+```txt
 2019-01-14T10:00:00.000Z        127.0.0.1       {"sqlQuery/time":100, 
"sqlQuery/planningTimeMs":10, "sqlQuery/bytes":600, "success":true, 
"identity":"user1"}  {"query":"SELECT page, COUNT(*) AS Edits FROM wikiticker 
WHERE TIME_IN_INTERVAL(\"__time\", '2015-09-12/2015-09-13') GROUP BY page ORDER 
BY Edits DESC LIMIT 
10","context":{"sqlQueryId":"c9d035a0-5ffd-4a79-a865-3ffdadbb5fdd","nativeQueryIds":"[490978e4-f5c7-4cf6-b174-346e63cf8863]"}}
 ```
 
@@ -401,7 +400,7 @@ Metric monitoring is an essential part of Druid operations. 
The following monito
 |`org.apache.druid.server.metrics.SegmentStatsMonitor` | **EXPERIMENTAL** 
Reports statistics about segments on Historical services. Available only on 
Historical services. Not to be used when lazy loading is configured.|
 |`org.apache.druid.server.metrics.QueryCountStatsMonitor`|Reports how many 
queries have been successful/failed/interrupted.|
 |`org.apache.druid.server.metrics.SubqueryCountStatsMonitor`|Reports how many 
subqueries have been materialized as rows or bytes and various other statistics 
related to the subquery execution|
-|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal 
metrics of `http` or `parametrized` emitter (see below). Must not be used with 
another emitter type. See the description of the metrics here: 
https://github.com/apache/druid/pull/4973.|
+|`org.apache.druid.server.emitter.HttpEmittingMonitor`|Reports internal 
metrics of `http` or `parametrized` emitter (see below). Must not be used with 
another emitter type. See the description of the metrics here: 
<https://github.com/apache/druid/pull/4973>.|
 |`org.apache.druid.server.metrics.TaskCountStatsMonitor`|Reports how many 
ingestion tasks are currently running/pending/waiting and also the number of 
successful/failed tasks per emission period.|
 |`org.apache.druid.server.metrics.TaskSlotCountStatsMonitor`|Reports metrics 
about task slot usage per emission period.|
 |`org.apache.druid.server.metrics.WorkerTaskCountStatsMonitor`|Reports how 
many ingestion tasks are currently running/pending/waiting, the number of 
successful/failed tasks, and metrics about task slot usage for the reporting 
worker, per emission period. Only supported by MiddleManager node types.|
@@ -409,7 +408,7 @@ Metric monitoring is an essential part of Druid operations. 
The following monito
 
 For example, you might configure monitors on all services for system and JVM 
information within `common.runtime.properties` as follows:
 
-```
+```properties
 
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor","org.apache.druid.java.util.metrics.JvmMonitor"]
 ```
 
@@ -419,13 +418,13 @@ You can override cluster-wide configuration by amending 
the `runtime.properties`
 
 There are several emitters available:
 
-- `noop` (default) disables metric emission.
-- [`logging`](#logging-emitter-module) emits logs using Log4j2.
-- [`http`](#http-emitter-module) sends `POST` requests of JSON events.
-- [`parametrized`](#parametrized-http-emitter-module) operates like the `http` 
emitter but fine-tunes the recipient URL based on the event feed.
-- [`composing`](#composing-emitter-module) initializes multiple emitter 
modules.
-- [`graphite`](#graphite-emitter) emits metrics to a 
[Graphite](https://graphiteapp.org/) Carbon service.
-- [`switching`](#switching-emitter) initializes and emits to multiple emitter 
modules based on the event feed.
+* `noop` (default) disables metric emission.
+* [`logging`](#logging-emitter-module) emits logs using Log4j2.
+* [`http`](#http-emitter-module) sends `POST` requests of JSON events.
+* [`parametrized`](#parametrized-http-emitter-module) operates like the `http` 
emitter but fine-tunes the recipient URL based on the event feed.
+* [`composing`](#composing-emitter-module) initializes multiple emitter 
modules.
+* [`graphite`](#graphite-emitter) emits metrics to a 
[Graphite](https://graphiteapp.org/) Carbon service.
+* [`switching`](#switching-emitter) initializes and emits to multiple emitter 
modules based on the event feed.
 
 ##### Logging emitter module
 
@@ -474,6 +473,7 @@ The following properties allow the HTTP Emitter to use its 
own truststore config
 
 The parametrized emitter takes the same configs as the [`http` 
emitter](#http-emitter-module) using the prefix 
`druid.emitter.parametrized.httpEmitting.`.
 For example:
+
 * `druid.emitter.parametrized.httpEmitting.flushMillis`
 * `druid.emitter.parametrized.httpEmitting.flushCount`
 * `druid.emitter.parametrized.httpEmitting.ssl.trustStorePath`
@@ -557,7 +557,7 @@ The below table shows some important configurations for S3. 
See [S3 Deep Storage
 |`druid.storage.bucket`|S3 bucket name.|none|
 |`druid.storage.baseKey`|S3 object key prefix for storage.|none|
 |`druid.storage.disableAcl`|Boolean flag for ACL. If this is set to `false`, 
the full control would be granted to the bucket owner. This may require to set 
additional permissions. See [S3 permissions 
settings](../development/extensions-core/s3.md#s3-permissions-settings).|false|
-|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the 
*archive task*.|none|
+|`druid.storage.archiveBucket`|S3 bucket name for archiving when running the 
_archive task_.|none|
 |`druid.storage.archiveBaseKey`|S3 object key prefix for archiving.|none|
 |`druid.storage.sse.type`|Server-side encryption type. Should be one of `s3`, 
`kms`, and `custom`. See the below [Server-side encryption 
section](../development/extensions-core/s3.md#server-side-encryption) for more 
details.|None|
 |`druid.storage.sse.kms.keyId`|AWS KMS key ID. This is used only when 
`druid.storage.sse.type` is `kms` and can be empty to use the default key 
ID.|None|
@@ -581,7 +581,6 @@ This deep storage is used to interface with Cassandra. You 
must load the `druid-
 |`druid.storage.host`|Cassandra host.|none|
 |`druid.storage.keyspace`|Cassandra key space.|none|
 
-
 #### Centralized datasource schema
 
 Centralized datasource schema is an [experimental 
feature](../development/experimental.md) to centralized datasource schema 
building within the Coordinator.
@@ -610,7 +609,6 @@ the [HDFS input 
source](../ingestion/input-sources.md#hdfs-input-source).
 |--------|---------------|-----------|-------|
 |`druid.ingestion.hdfs.allowedProtocols`|List of protocols|Allowed protocols 
for the HDFS input source and HDFS firehose.|`["hdfs"]`|
 
-
 #### HTTP input source
 
 You can set the following property to specify permissible protocols for
@@ -620,15 +618,15 @@ the [HTTP input 
source](../ingestion/input-sources.md#http-input-source).
 |--------|---------------|-----------|-------|
 |`druid.ingestion.http.allowedProtocols`|List of protocols|Allowed protocols 
for the HTTP input source and HTTP firehose.|`["http", "https"]`|
 
-
 ### External data access security configuration
 
 #### JDBC connections to external databases
 
 You can use the following properties to specify permissible JDBC options for:
-- [SQL input source](../ingestion/input-sources.md#sql-input-source)
-- [globally cached JDBC 
lookups](../querying/lookups-cached-global.md#jdbc-lookup)
-- [JDBC Data Fetcher for per-lookup 
caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).
+
+* [SQL input source](../ingestion/input-sources.md#sql-input-source)
+* [globally cached JDBC 
lookups](../querying/lookups-cached-global.md#jdbc-lookup)
+* [JDBC Data Fetcher for per-lookup 
caching](../development/extensions-core/druid-lookups.md#data-fetcher-layer).
 
 These properties do not apply to metadata storage connections.
 
@@ -720,9 +718,10 @@ You can configure Druid API error responses to hide 
internal information like th
 
 You can use an error response transform strategy to transform error responses 
from within Druid services to hide internal information.
 When you specify an error response transform strategy other than `none`, Druid 
transforms the error responses from Druid services as follows:
- - For any query API that fails in the Router service, Druid sets the fields 
`errorClass` and `host` to null. Druid applies the transformation strategy to 
the `errorMessage` field.
- - For any SQL query API that fails, for example `POST /druid/v2/sql/...`, 
Druid sets the fields `errorClass` and `host` to null. Druid applies the 
transformation strategy to the `errorMessage` field.
- - For any JDBC related exceptions, Druid will turn all checked exceptions 
into `QueryInterruptedException` otherwise druid will attempt to keep the 
exception as the same type. For example if the original exception isn't owned 
by Druid it will become `QueryInterruptedException`. Druid applies the 
transformation strategy to the `errorMessage` field.
+
+* For any query API that fails in the Router service, Druid sets the fields 
`errorClass` and `host` to null. Druid applies the transformation strategy to 
the `errorMessage` field.
+* For any SQL query API that fails, for example `POST /druid/v2/sql/...`, 
Druid sets the fields `errorClass` and `host` to null. Druid applies the 
transformation strategy to the `errorMessage` field.
+* For any JDBC related exceptions, Druid will turn all checked exceptions into 
`QueryInterruptedException` otherwise druid will attempt to keep the exception 
as the same type. For example if the original exception isn't owned by Druid it 
will become `QueryInterruptedException`. Druid applies the transformation 
strategy to the `errorMessage` field.
 
 ###### No error response transform strategy
 
@@ -739,19 +738,19 @@ In this mode, Druid validates the error responses from 
underlying services again
 
 For example, consider the following error response:
 
-```
+```json
 {"error":"Plan validation 
failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException: 
From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource' 
not 
found","errorClass":"org.apache.calcite.tools.ValidationException","host":null}
 ```
 
 If `druid.server.http.errorResponseTransform.allowedRegex` is set to `[]`, 
Druid transforms the query error response to the following:
 
-```
+```json
 {"error":"Plan validation 
failed","errorMessage":null,"errorClass":null,"host":null}
 ```
 
 On the other hand, if `druid.server.http.errorResponseTransform.allowedRegex` 
is set to `[".*CalciteContextException.*"]` then Druid transforms the query 
error response to the following:
 
-```
+```json
 {"error":"Plan validation 
failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException: 
From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource' 
not found","errorClass":null,"host":null}
 ```
 
@@ -820,6 +819,7 @@ Support for 64-bit floating point columns was released in 
Druid 0.11.0, so if yo
 |`druid.indexing.doubleStorage`|Set to "float" to use 32-bit double 
representation for double columns.|double|
 
 ### SQL compatible null handling
+
 These configurations are deprecated and will be removed in a future release at 
which point Druid will always have SQl compatible null handling.
 
 Prior to version 0.13.0, Druid string columns treated `''` and `null` values 
as interchangeable, and numeric columns were unable to represent `null` values, 
coercing `null` to `0`. Druid 0.13.0 introduced a mode which enabled SQL 
compatible null handling, allowing string columns to distinguish empty strings 
from nulls, and numeric columns to contain null rows.
@@ -1118,7 +1118,7 @@ These Overlord static configurations can be defined in 
the `overlord/runtime.pro
 |`druid.indexer.runner.type`|Indicates whether tasks should be run locally 
using `local` or in a distributed environment using `remote`. The recommended 
option is `httpRemote`, which is similar to `remote` but uses HTTP to interact 
with Middle Managers instead of ZooKeeper.|`httpRemote`|
 |`druid.indexer.storage.type`|Indicates whether incoming tasks should be 
stored locally (in heap) or in metadata storage. One of `local` or `metadata`. 
`local` is mainly for internal testing while `metadata` is recommended in 
production because storing incoming tasks in metadata storage allows for tasks 
to be resumed if the Overlord should fail.|`local`|
 |`druid.indexer.storage.recentlyFinishedThreshold`|Duration of time to store 
task results. Default is 24 hours. If you have hundreds of tasks running in a 
day, consider increasing this threshold.|`PT24H`|
-|`druid.indexer.tasklock.forceTimeChunkLock`|_**Setting this to false is still 
experimental**_<br/> If set, all tasks are enforced to use time chunk lock. If 
not set, each task automatically chooses a lock type to use. This configuration 
can be overwritten by setting `forceTimeChunkLock` in the [task 
context](../ingestion/tasks.md#context). See [Task Locking & 
Priority](../ingestion/tasks.md#context) for more details about locking in 
tasks.|true|
+|`druid.indexer.tasklock.forceTimeChunkLock`|**Setting this to false is still 
experimental**<br/> If set, all tasks are enforced to use time chunk lock. If 
not set, each task automatically chooses a lock type to use. This configuration 
can be overwritten by setting `forceTimeChunkLock` in the [task 
context](../ingestion/tasks.md#context). See [Task Locking & 
Priority](../ingestion/tasks.md#context) for more details about locking in 
tasks.|true|
 |`druid.indexer.tasklock.batchSegmentAllocation`| If set to true, Druid 
performs segment allocate actions in batches to improve throughput and reduce 
the average `task/action/run/time`. See [batching `segmentAllocate` 
actions](../ingestion/tasks.md#batching-segmentallocate-actions) for 
details.|true|
 |`druid.indexer.tasklock.batchAllocationWaitTime`|Number of milliseconds after 
Druid adds the first segment allocate action to a batch, until it executes the 
batch. Allows the batch to add more requests and improve the average segment 
allocation run time. This configuration takes effect only if 
`batchSegmentAllocation` is enabled.|0|
 |`druid.indexer.task.default.context`|Default task context that is applied to 
all tasks submitted to the Overlord. Any default in this config does not 
override neither the context values the user provides nor 
`druid.indexer.tasklock.forceTimeChunkLock`.|empty context|
@@ -1239,29 +1239,29 @@ either an `affinityConfig` or a `categorySpec`. Then, 
Druid assigns the task by
 (`equalDistribution`) or to fill as many workers as possible to capacity 
(`fillCapacity`).
 There are 4 options for select strategies:
 
-- [`equalDistribution`](#equaldistribution)
-- [`equalDistributionWithCategorySpec`](#equaldistributionwithcategoryspec)
-- [`fillCapacity`](#fillcapacity)
-- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+* [`equalDistribution`](#equaldistribution)
+* [`equalDistributionWithCategorySpec`](#equaldistributionwithcategoryspec)
+* [`fillCapacity`](#fillcapacity)
+* [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
 
 A `javascript` option is also available but should only be used for 
prototyping new strategies.
 
 If an `affinityConfig` is provided (as part of `fillCapacity` and 
`equalDistribution` strategies) for a given task, the list of workers eligible 
to be assigned is determined as follows:
 
-- a non-affinity worker if no affinity is specified for that datasource. Any 
worker not listed in the `affinityConfig` is considered a non-affinity worker.
-- a non-affinity worker if preferred workers are not available and the 
affinity is _weak_ i.e. `strong: false`.
-- a preferred worker listed in the `affinityConfig` for this datasource if it 
has available capacity
-- no worker if preferred workers are not available and affinity is _strong_ 
i.e. `strong: true`. In this case, the task remains in "pending" state. The 
chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total 
number of pending tasks to determine if a new node should be provisioned.
+* a non-affinity worker if no affinity is specified for that datasource. Any 
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+* a non-affinity worker if preferred workers are not available and the 
affinity is _weak_ i.e. `strong: false`.
+* a preferred worker listed in the `affinityConfig` for this datasource if it 
has available capacity
+* no worker if preferred workers are not available and affinity is _strong_ 
i.e. `strong: true`. In this case, the task remains in "pending" state. The 
chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total 
number of pending tasks to determine if a new node should be provisioned.
 
 Note that every worker listed in the `affinityConfig` will only be used for 
the assigned datasources and no other.
 
 If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and 
`equalDistributionWithCategorySpec` strategies), then a task of a given 
datasource may be assigned to:
 
-- any worker if no category config is given for task type
-- any worker if category config is given for task type but no category is 
given for datasource and there's no default category
-- a preferred worker (based on category config and category for datasource) if 
available
-- any worker if category config and category are given but no preferred worker 
is available and category config is `weak`
-- not assigned at all if preferred workers are not available and category 
config is `strong`
+* any worker if no category config is given for task type
+* any worker if category config is given for task type but no category is 
given for datasource and there's no default category
+* a preferred worker (based on category config and category for datasource) if 
available
+* any worker if category config and category are given but no preferred worker 
is available and category config is `weak`
+* not assigned at all if preferred workers are not available and category 
config is `strong`
 
 In both the cases, Druid determines the list of eligible workers and selects 
one depending on their load with the goal of either distributing the load 
equally or filling as few workers as possible.
 
@@ -1299,7 +1299,7 @@ The following example shows tasks of type `index_kafka` 
that default to running
       "strong": false,
       "categoryMap": {
         "index_kafka": {
-          "defaultCategory": "c1", 
+          "defaultCategory": "c1",
           "categoryAffinity": {
             "ds1": "c2"
           }
@@ -1437,7 +1437,7 @@ MiddleManagers pass their configurations down to their 
child peons. The MiddleMa
 |`druid.indexer.runner.compressZnodes`|Indicates whether or not the 
MiddleManagers should compress Znodes.|true|
 |`druid.indexer.runner.classpath`|Java classpath for the 
peon.|`System.getProperty("java.class.path")`|
 |`druid.indexer.runner.javaCommand`|Command required to execute java.|java|
-|`druid.indexer.runner.javaOpts`|*DEPRECATED* A string of -X Java options to 
pass to the peon's JVM. Quotable parameters or parameters with spaces are 
encouraged to use javaOptsArray|`''`|
+|`druid.indexer.runner.javaOpts`|_DEPRECATED_ A string of -X Java options to 
pass to the peon's JVM. Quotable parameters or parameters with spaces are 
encouraged to use javaOptsArray|`''`|
 |`druid.indexer.runner.javaOptsArray`|A JSON array of strings to be passed in 
as options to the peon's JVM. This is additive to 
`druid.indexer.runner.javaOpts` and is recommended for properly handling 
arguments which contain quotes or spaces like `["-XX:OnOutOfMemoryError=kill -9 
%p"]`|`[]`|
 |`druid.indexer.runner.maxZnodeBytes`|The maximum size Znode in bytes that can 
be created in ZooKeeper, should be in the range of [10KiB, 2GiB). 
[Human-readable format](human-readable-byte.md) is supported.|512KiB|
 |`druid.indexer.runner.startPort`|Starting port used for Peon services, should 
be greater than 1023 and less than 65536.|8100|
@@ -1449,7 +1449,7 @@ MiddleManagers pass their configurations down to their 
child peons. The MiddleMa
 |`druid.worker.baseTaskDirs`|List of base temporary working directories, one 
of which is assigned per task in a round-robin fashion. This property can be 
used to allow usage of multiple disks for indexing. This property is 
recommended in place of and takes precedence over 
`${druid.indexer.task.baseTaskDir}`.  If this configuration is not set, 
`${druid.indexer.task.baseTaskDir}` is used. For example, 
`druid.worker.baseTaskDirs=[\"PATH1\",\"PATH2\",...]`.|null|
 |`druid.worker.baseTaskDirSize`|The total amount of bytes that can be used by 
tasks on any single task dir. This value is treated symmetrically across all 
directories, that is, if this is 500 GB and there are 3 `baseTaskDirs`, then 
each of those task directories is assumed to allow for 500 GB to be used and a 
total of 1.5 TB will potentially be available across all tasks. The actual 
amount of memory assigned to each task is discussed in [Configuring task 
storage sizes](../ingestion/tasks [...]
 |`druid.worker.category`|A string to name the category that the MiddleManager 
node belongs to.|`_default_worker_category`|
-|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This 
config should be set when [Centralized Datasource 
Schema](#centralized-datasource-schema) feature is enabled. |false| 
+|`druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled`| This 
config should be set when [Centralized Datasource 
Schema](#centralized-datasource-schema) feature is enabled. |false|
 
 #### Peon processing
 
@@ -1488,11 +1488,11 @@ You can optionally configure caching to be enabled on 
the peons by setting cachi
 
 See [cache configuration](#cache-configuration) for how to configure cache 
settings.
 
-
 #### Additional Peon configuration
+
 Although Peons inherit the configurations of their parent MiddleManagers, 
explicit child Peon configs in MiddleManager can be set by prefixing them with:
 
-```
+```properties
 druid.indexer.fork.property
 ```
 
@@ -1525,14 +1525,14 @@ If the Peon is running in remote mode, there must be an 
Overlord up and running.
 
 ##### SegmentWriteOutMediumFactory
 
-When new segments are created, Druid temporarily stores some preprocessed data 
in some buffers. 
+When new segments are created, Druid temporarily stores some preprocessed data 
in some buffers.
 The following types of medium exist for the buffers:
 
 * **Temporary files** (`tmpFile`) are stored under the task working directory 
(see `druid.worker.baseTaskDirs` configuration above) and thus share it's 
mounting properties. For example, they could be backed by HDD, SSD or memory 
(tmpfs).
 This type of medium may do unnecessary disk I/O and requires some disk space 
to be available.
 
 * **Off-heap memory** (`offHeapMemory`) creates buffers in off-heap memory of 
a JVM process that is running a task.
-This type of medium is preferred, but it may require to allow the JVM to have 
more off-heap memory, by changing `-XX:MaxDirectMemorySize` configuration. It 
is not yet understood how does the required off-heap memory size relates to the 
size of the segments being created. But definitely it doesn't make sense to add 
more extra off-heap memory, than the configured maximum *heap* size (`-Xmx`) 
for the same JVM.
+This type of medium is preferred, but it may require to allow the JVM to have 
more off-heap memory, by changing `-XX:MaxDirectMemorySize` configuration. It 
is not yet understood how does the required off-heap memory size relates to the 
size of the segments being created. But definitely it doesn't make sense to add 
more extra off-heap memory, than the configured maximum _heap_ size (`-Xmx`) 
for the same JVM.
 
 * **On-heap memory** (`onHeapMemory`) creates buffers using the allocated heap 
memory of the JVM process running a task. Using on-heap memory introduces 
garbage collection overhead and so is not recommended in most cases. This type 
of medium is most helpful for tasks run on external clusters where it may be 
difficult to allocate and work with direct memory effectively.
 
@@ -1571,7 +1571,8 @@ For most types of tasks, `SegmentWriteOutMediumFactory` 
can be configured per-ta
 |`druid.indexer.task.hadoopWorkingPath`|Temporary working directory for Hadoop 
tasks.|`/tmp/druid-indexing`|
 |`druid.indexer.task.restoreTasksOnRestart`|If true, the Indexer will attempt 
to stop tasks gracefully on shutdown and restore them on restart.|false|
 |`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks 
using the [Druid input source](../ingestion/input-sources.md) will ignore the 
provided timestampSpec, and will use the `__time` column of the input 
datasource. This option is provided for compatibility with ingestion specs 
written before Druid 0.22.0.|false|
-|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to 
store empty columns during ingestion. When set to true, Druid stores every 
column specified in the 
[`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). <br/><br/>If 
you set `storeEmptyColumns` to false, Druid SQL queries referencing empty 
columns will fail. If you intend to leave `storeEmptyColumns` disabled, you 
should either ingest placeholder data for empty columns or else not query on 
empty colu [...]
+|`druid.indexer.task.storeEmptyColumns`|Boolean value for whether or not to 
store empty columns during ingestion. When set to true, Druid stores every 
column specified in the 
[`dimensionsSpec`](../ingestion/ingestion-spec.md#dimensionsspec). <br/><br/>If 
you set `storeEmptyColumns` to false, Druid SQL queries referencing empty 
columns will fail. If you intend to leave `storeEmptyColumns` disabled, you 
should either ingest placeholder data for empty columns or else not query on 
empty colu [...]
+|`druid.peon.taskActionClient.retry.minWait`|The minimum retry time to 
communicate with Overlord.|`PT5S`|
 |`druid.peon.taskActionClient.retry.maxWait`|The maximum retry time to 
communicate with Overlord.|`PT1M`|
 |`druid.peon.taskActionClient.retry.maxRetryCount`|The maximum number of 
retries to communicate with Overlord.|60|
 
@@ -1955,7 +1956,7 @@ The Druid SQL server is configured through the following 
properties on the Broke
 |`druid.sql.planner.useApproximateCountDistinct`|Whether to use an approximate 
cardinality algorithm for `COUNT(DISTINCT foo)`.|true|
 |`druid.sql.planner.useGroupingSetForExactDistinct`|Only relevant when 
`useApproximateCountDistinct` is disabled. If set to true, exact distinct 
queries are re-written using grouping sets. Otherwise, exact distinct queries 
are re-written using joins. This should be set to true for group by query with 
multiple exact distinct aggregations. This flag can be overridden per 
query.|false|
 |`druid.sql.planner.useApproximateTopN`|Whether to use approximate [TopN 
queries](../querying/topnquery.md) when a SQL query could be expressed as such. 
If false, exact [GroupBy queries](../querying/groupbyquery.md) will be used 
instead.|true|
-|`druid.sql.planner.requireTimeCondition`|Whether to require SQL to have 
filter conditions on __time column so that all generated native queries will 
have user specified intervals. If true, all queries without filter condition on 
__time column will fail|false|
+|`druid.sql.planner.requireTimeCondition`|Whether to require SQL to have 
filter conditions on `__time` column so that all generated native queries will 
have user specified intervals. If true, all queries without filter condition on 
`__time` column will fail|false|
 |`druid.sql.planner.sqlTimeZone`|Sets the default time zone for the server, 
which will affect how time functions and timestamp literals behave. Should be a 
time zone name like "America/Los_Angeles" or offset like "-08:00".|UTC|
 |`druid.sql.planner.metadataSegmentCacheEnable`|Whether to keep a cache of 
published segments in broker. If true, broker polls coordinator in background 
to get segments from metadata store and maintains a local cache. If false, 
coordinator's REST API will be invoked when broker needs published segments 
info.|false|
 |`druid.sql.planner.metadataSegmentPollPeriod`|How often to poll coordinator 
for published segments list if `druid.sql.planner.metadataSegmentCacheEnable` 
is set to true. Poll period is in milliseconds. |60000|
@@ -2017,7 +2018,6 @@ Use the `druid.cache.type` configuration to set a 
different kind of cache.
 Cache settings are set globally, so the same configuration can be re-used
 for both Broker and Historical processes, when defined in the common 
properties file.
 
-
 ### Cache type
 
 |Property|Possible Values|Description|Default|
@@ -2090,7 +2090,7 @@ Uses memcached as cache backend. This allows all 
processes to share the same cac
 |`druid.cache.locator`| Memcached locator. Can be consistent or 
`array_mod`.|consistent|
 |`druid.cache.enableTls`|Enable TLS based connection for Memcached client. 
Boolean.|false|
 |`druid.cache.clientMode`|Client Mode. Static mode requires the user to 
specify individual cluster nodes. Dynamic mode uses 
[AutoDiscovery](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.HowAutoDiscoveryWorks.html)
 feature of AWS Memcached. String. 
["static"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Manual.html)
 or 
["dynamic"](https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/AutoDiscovery.Using.ModifyApp.Java.html)|static|
-|`druid.cache.skipTlsHostnameVerification`|Skip TLS Hostname Verification. 
Boolean.|true| 
+|`druid.cache.skipTlsHostnameVerification`|Skip TLS Hostname Verification. 
Boolean.|true|
 
 #### Hybrid
 
@@ -2181,7 +2181,6 @@ Supported query contexts:
 |`maxMergingDictionarySize`|Can be used to lower the value of 
`druid.query.groupBy.maxMergingDictionarySize` for this query.|
 |`maxOnDiskStorage`|Can be used to set `maxOnDiskStorage` to a value between 0 
and `druid.query.groupBy.maxOnDiskStorage` for this query. If this query 
context override exceeds `druid.query.groupBy.maxOnDiskStorage`, the query will 
use `druid.query.groupBy.maxOnDiskStorage`. Omitting this from the query 
context will cause the query to use `druid.query.groupBy.defaultOnDiskStorage` 
for `maxOnDiskStorage`|
 
-
 ### Advanced configurations
 
 Supported runtime properties:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch master updated: Bring back missing property in indexer documentation (#16582)

Reply via email to