sthetland commented on a change in pull request #11541:
URL: https://github.com/apache/druid/pull/11541#discussion_r686386422
##########
File path: docs/ingestion/index.md
##########
@@ -59,6 +50,8 @@ The most recommended, and most popular, method of streaming
ingestion is the
[Kafka indexing service](../development/extensions-core/kafka-ingestion.md)
that reads directly from Kafka. Alternatively, the Kinesis
indexing service works with Amazon Kinesis Data Streams.
+Streaming ingestion uses an onging process called a supervisor that reads from
the data stream to ingest data into Druid.
Review comment:
```suggestion
Streaming ingestion uses an ongoing process called a supervisor, which reads
from the data stream to ingest data into Druid.
```
Or "Streaming ingestion uses an ongoing process called a supervisor, which
ingests data into Druid by reading from data streams."
##########
File path: docs/ingestion/data-formats.md
##########
@@ -215,21 +218,22 @@ The `inputFormat` to load data of Parquet format. An
example is:
}
```
-The Parquet `inputFormat` has the following components:
+### Avro Stream
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-|type| String| This should be set to `parquet` to read Parquet file| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract
nested values from a Parquet file. Note that only 'path' expression are
supported ('jq' is unavailable).| no (default will auto-discover 'root' level
properties) |
-| binaryAsString | Boolean | Specifies if the bytes parquet column which is
not logically marked as a string or enum type should be treated as a UTF-8
encoded string. | no (default = false) |
+To use the Avro Stream input format load the Druid Avro extension
([`druid-avro-extensions`](../development/extensions-core/avro.md)).
-### Avro Stream
+For more informtion on how Druid handles Avro types, see [Avro
Types](../development/extensions-core/avro.md#avro-types) section for
-> You need to include the
[`druid-avro-extensions`](../development/extensions-core/avro.md) as an
extension to use the Avro Stream input format.
+Configure the Avro `inputFormat` to load Avro data as follows:
-> See the [Avro Types](../development/extensions-core/avro.md#avro-types)
section for how Avro types are handled in Druid
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+|type| String| This should be set to `avro_stream` to read Avro serialized
data| yes |
+|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract
nested values from a Avro record. Note that only 'path' expression are
supported ('jq' is unavailable).| no (default will auto-discover 'root' level
properties) |
Review comment:
```suggestion
|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract
nested values from an Avro record. Note that only 'path' expressions are
supported; 'jq' is unavailable.| no (default will auto-discover 'root' level
properties) |
```
##########
File path: docs/querying/query-context.md
##########
@@ -61,6 +61,7 @@ Unless otherwise noted, the following parameters apply to all
query types.
|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter
to Conjunctive Normal Form (CNF). During query processing, columns can be
pre-filtered by intersecting the bitmap indexes of all values that match the
eligible filters, often greatly reducing the raw number of rows which need to
be scanned. But this effect only happens for the top level filter, or
individual clauses of a top level 'and' filter. As such, filters in CNF
potentially have a higher chance to utilize a large amount of bitmap indexes on
string columns during pre-filtering. However, this setting should be used with
great caution, as it can sometimes have a negative effect on performance, and
in some cases, the act of computing CNF of a filter can be expensive. We
recommend hand tuning your filters to produce an optimal form if possible, or
at least verifying through experimentation that using this parameter actually
improves your query performance with no ill-effects.|
|secondaryPartitionPruning|`true`|Enable secondary partition pruning on the
Broker. The Broker will always prune unnecessary segments from the input scan
based on a filter on time intervals, but if the data is further partitioned
with hash or range partitioning, this option will enable additional pruning
based on a filter on secondary partition dimensions.|
|enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have
joins. For joins, where left child is a simple scan with a filter, by default,
druid will run the scan as a query and the join the results to the right child
on broker. Setting this flag to true overrides that behavior and druid will
attempt to push the join to data servers instead. Please note that the flag
could be applicable to queries even if there is no explicit join. since queries
can internally translated into a join by the SQL planner.|
+|debug| `false` | Flag indicating whether to enable debugging outputs for the
query. When set to false, no additional logs will be produced (logs produced
will be entirely dependent on your logging level). When set to true, the
following addition logs will be produced:<br />- Log the stack trace of the
exception (if any) produced by the query |
Review comment:
```suggestion
|debug| `false` | Flag indicating whether to enable debugging outputs for
the query. When set to false, no additional logs will be produced. (Logs
produced will be entirely dependent on your logging level.) When set to true,
the following additional logs will be produced:<br />- Log the stack trace of
the exception (if any) produced by the query |
```
##########
File path: docs/ingestion/ingestion-spec.md
##########
@@ -0,0 +1,464 @@
+---
+id: ingestion-spec
+title: Ingestion spec reference
+sidebar_label: Ingestion spec
+description: Reference for the configuration options in the ingestion spec.
+---
+
+All ingestion methods use ingestion tasks to load data into Druid. Streaming
ingestion uses ongoing supervisors that run and supervise a set of tasks over
time. Native batch and Hadoop-based ingestion use a one-time [task](tasks.md).
All types of ingestion use an _ingestion spec_ to configure ingestion.
+
+Ingestion specs consists of three main components:
+
+- [`dataSchema`](#dataschema), which configures the [datasource
name](#datasource),
+ [primary timestamp](#timestampspec), [dimensions](#dimensionsspec),
[metrics](#metricsspec), and [transforms and filters](#transformspec) (if
needed).
+- [`ioConfig`](#ioconfig), which tells Druid how to connect to the source
system and how to parse data. For more information, see the
+ documentation for each [ingestion method](./index.md#ingestion-methods).
+- [`tuningConfig`](#tuningconfig), which controls various tuning parameters
specific to each
+ [ingestion method](./index.md#ingestion-methods).
+
+Example ingestion spec for task type `index_parallel` (native batch):
+
+```
+{
+ "type": "index_parallel",
+ "spec": {
+ "dataSchema": {
+ "dataSource": "wikipedia",
+ "timestampSpec": {
+ "column": "timestamp",
+ "format": "auto"
+ },
+ "dimensionsSpec": {
+ "dimensions": [
+ { "type": "string", "page" },
+ { "type": "string", "language" },
+ { "type": "long", "name": "userId" }
+ ]
+ },
+ "metricsSpec": [
+ { "type": "count", "name": "count" },
+ { "type": "doubleSum", "name": "bytes_added_sum", "fieldName":
"bytes_added" },
+ { "type": "doubleSum", "name": "bytes_deleted_sum", "fieldName":
"bytes_deleted" }
+ ],
+ "granularitySpec": {
+ "segmentGranularity": "day",
+ "queryGranularity": "none",
+ "intervals": [
+ "2013-08-31/2013-09-01"
+ ]
+ }
+ },
+ "ioConfig": {
+ "type": "index_parallel",
+ "inputSource": {
+ "type": "local",
+ "baseDir": "examples/indexing/",
+ "filter": "wikipedia_data.json"
+ },
+ "inputFormat": {
+ "type": "json",
+ "flattenSpec": {
+ "useFieldDiscovery": true,
+ "fields": [
+ { "type": "path", "name": "userId", "expr": "$.user.id" }
+ ]
+ }
+ }
+ },
+ "tuningConfig": {
+ "type": "index_parallel"
+ }
+ }
+}
+```
+
+The specific options supported by these sections will depend on the [ingestion
method](./index.md#ingestion-methods) you have chosen.
+For more examples, refer to the documentation for each ingestion method.
+
+You can also load data visually, without the need to write an ingestion spec,
using the "Load data" functionality
+available in Druid's [web console](../operations/druid-console.md). Druid's
visual data loader supports
+[Kafka](../development/extensions-core/kafka-ingestion.md),
+[Kinesis](../development/extensions-core/kinesis-ingestion.md), and
+[native batch](native-batch.md) mode.
+
+## `dataSchema`
+
+> The `dataSchema` spec has been changed in 0.17.0. The new spec is supported
by all ingestion methods
+except for _Hadoop_ ingestion. See the [Legacy `dataSchema`
spec](#legacy-dataschema-spec) for the old spec.
+
+The `dataSchema` is a holder for the following components:
+
+- [datasource name](#datasource)
+- [primary timestamp](#timestampspec)
+- [dimensions](#dimensionsspec)
+- [metrics](#metricsspec)
+- [transforms and filters](#transformspec) (if needed).
+
+An example `dataSchema` is:
+
+```
+"dataSchema": {
+ "dataSource": "wikipedia",
+ "timestampSpec": {
+ "column": "timestamp",
+ "format": "auto"
+ },
+ "dimensionsSpec": {
+ "dimensions": [
+ { "type": "string", "page" },
+ { "type": "string", "language" },
+ { "type": "long", "name": "userId" }
+ ]
+ },
+ "metricsSpec": [
+ { "type": "count", "name": "count" },
+ { "type": "doubleSum", "name": "bytes_added_sum", "fieldName":
"bytes_added" },
+ { "type": "doubleSum", "name": "bytes_deleted_sum", "fieldName":
"bytes_deleted" }
+ ],
+ "granularitySpec": {
+ "segmentGranularity": "day",
+ "queryGranularity": "none",
+ "intervals": [
+ "2013-08-31/2013-09-01"
+ ]
+ }
+}
+```
+
+### `dataSource`
+
+The `dataSource` is located in `dataSchema` → `dataSource` and is simply the
name of the
+[datasource](../design/architecture.md#datasources-and-segments) that data
will be written to. An example
+`dataSource` is:
+
+```
+"dataSource": "my-first-datasource"
+```
+
+### `timestampSpec`
+
+The `timestampSpec` is located in `dataSchema` → `timestampSpec` and is
responsible for
+configuring the [primary timestamp](./data-model.md#primary-timestamp). An
example `timestampSpec` is:
+
+```
+"timestampSpec": {
+ "column": "timestamp",
+ "format": "auto"
+}
+```
+
+> Conceptually, after input data records are read, Druid applies ingestion
spec components in a particular order:
+> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then
[`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
+> and finally [`dimensionsSpec`](#dimensionsspec) and
[`metricsSpec`](#metricsspec). Keep this in mind when writing
+> your ingestion spec.
+
+A `timestampSpec` can have the following components:
+
+|Field|Description|Default|
+|-----|-----------|-------|
+|column|Input row field to read the primary timestamp from.<br><br>Regardless
of the name of this input field, the primary timestamp will always be stored as
a column named `__time` in your Druid datasource.|timestamp|
+|format|Timestamp format. Options are: <ul><li>`iso`: ISO8601 with 'T'
separator, like "2000-01-01T01:02:03.456"</li><li>`posix`: seconds since
epoch</li><li>`millis`: milliseconds since epoch</li><li>`micro`: microseconds
since epoch</li><li>`nano`: nanoseconds since epoch</li><li>`auto`:
automatically detects ISO (either 'T' or space separator) or millis
format</li><li>any [Joda DateTimeFormat
string](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html)</li></ul>|auto|
+|missingValue|Timestamp to use for input records that have a null or missing
timestamp `column`. Should be in ISO8601 format, like
`"2000-01-01T01:02:03.456"`, even if you have specified something else for
`format`. Since Druid requires a primary timestamp, this setting can be useful
for ingesting datasets that do not have any per-record timestamps at all. |none|
+
+### `dimensionsSpec`
+
+The `dimensionsSpec` is located in `dataSchema` → `dimensionsSpec` and is
responsible for
+configuring [dimensions](./data-model.md#dimensions). An example
`dimensionsSpec` is:
+
+```
+"dimensionsSpec" : {
+ "dimensions": [
+ "page",
+ "language",
+ { "type": "long", "name": "userId" }
+ ],
+ "dimensionExclusions" : [],
+ "spatialDimensions" : []
+}
+```
+
+> Conceptually, after input data records are read, Druid applies ingestion
spec components in a particular order:
+> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then
[`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
+> and finally [`dimensionsSpec`](#dimensionsspec) and
[`metricsSpec`](#metricsspec). Keep this in mind when writing
+> your ingestion spec.
+
+A `dimensionsSpec` can have the following components:
+
+| Field | Description | Default |
+|-------|-------------|---------|
+| dimensions | A list of [dimension names or objects](#dimension-objects).
Cannot have the same column in both `dimensions` and
`dimensionExclusions`.<br><br>If this and `spatialDimensions` are both null or
empty arrays, Druid will treat all non-timestamp, non-metric columns that do
not appear in `dimensionExclusions` as String-typed dimension columns. See
[inclusions and exclusions](#inclusions-and-exclusions) below for details. |
`[]` |
+| dimensionExclusions | The names of dimensions to exclude from ingestion.
Only names are supported here, not objects.<br><br>This list is only used if
the `dimensions` and `spatialDimensions` lists are both null or empty arrays;
otherwise it is ignored. See [inclusions and
exclusions](#inclusions-and-exclusions) below for details. | `[]` |
+| spatialDimensions | An array of [spatial dimensions](../development/geo.md).
| `[]` |
+
+#### Dimension objects
+
+Each dimension in the `dimensions` list can either be a name or an object.
Providing a name is equivalent to providing
+a `string` type dimension object with the given name, e.g. `"page"` is
equivalent to `{"name": "page", "type": "string"}`.
+
+Dimension objects can have the following components:
+
+| Field | Description | Default |
+|-------|-------------|---------|
+| type | Either `string`, `long`, `float`, or `double`. | `string` |
+| name | The name of the dimension. This will be used as the field name to
read from input records, as well as the column name stored in generated
segments.<br><br>Note that you can use a [`transformSpec`](#transformspec) if
you want to rename columns during ingestion time. | none (required) |
+| createBitmapIndex | For `string` typed dimensions, whether or not bitmap
indexes should be created for the column in generated segments. Creating a
bitmap index requires more storage, but speeds up certain kinds of filtering
(especially equality and prefix filtering). Only supported for `string` typed
dimensions. | `true` |
+| multiValueHandling | Specify the type of handling for [multi-value
fields](../querying/multi-value-dimensions.md). Possible values are
`sorted_array`, `sorted_set`, and `array`. `sorted_array` and `sorted_set`
order the array upon ingestion. `sorted_set` removes duplicates. `array`
ingests data as-is | `sorted_array` |
+
+#### Inclusions and exclusions
+
+Druid will interpret a `dimensionsSpec` in two possible ways: _normal_ or
_schemaless_.
+
+Normal interpretation occurs when either `dimensions` or `spatialDimensions`
is non-empty. In this case, the combination of the two lists will be taken as
the set of dimensions to be ingested, and the list of `dimensionExclusions`
will be ignored.
+
+Schemaless interpretation occurs when both `dimensions` and
`spatialDimensions` are empty or null. In this case, the set of dimensions is
determined in the following way:
+
+1. First, start from the set of all root-level fields from the input record,
as determined by the [`inputFormat`](./data-formats.md). "Root-level" includes
all fields at the top level of a data structure, but does not included fields
nested within maps or lists. To extract these, you must use a
[`flattenSpec`](./data-formats.md#flattenspec). All fields of non-nested data
formats, such as CSV and delimited text, are considered root-level.
+2. If a [`flattenSpec`](./data-formats.md#flattenspec) is being used, the set
of root-level fields includes any fields generated by the flattenSpec. The
useFieldDiscovery parameter determines whether the original root-level fields
will be retained or discarded.
Review comment:
```suggestion
2. If a [`flattenSpec`](./data-formats.md#flattenspec) is being used, the
set of root-level fields includes any fields generated by the `flattenSpec`.
The `useFieldDiscovery` parameter determines whether the original root-level
fields will be retained or discarded.
```
##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -22,9 +22,21 @@ title: "Multi-value dimensions"
~ under the License.
-->
+<<<<<<< HEAD
Review comment:
uh oh, this looks familiar. I think lines 27 through 39 can just be
replaced with
```
Apache Druid supports "multi-value" string dimensions. These are generated
when an input field contains an
array of values instead of a single value (e.g. JSON arrays, or a TSV field
containing one or more `listDelimiter` characters).
By default Druid stores the values in alphabetical order, see [Dimension
Objects](../ingestion/ingestion-spec.md#dimension-objects) for configuration.
```
##########
File path: docs/development/extensions-core/protobuf.md
##########
@@ -112,82 +112,86 @@ Important supervisor properties
- `protoBytesDecoder.descriptor` for the descriptor file URL
- `protoBytesDecoder.protoMessageType` from the proto definition
- `protoBytesDecoder.type` set to `file`, indicate use descriptor file to
decode Protobuf file
-- `parser` should have `type` set to `protobuf`, but note that the `format` of
the `parseSpec` must be `json`
+- `inputFormat` should have `type` set to `protobuf`
```json
{
- "type": "kafka",
- "dataSchema": {
- "dataSource": "metrics-protobuf",
- "parser": {
- "type": "protobuf",
- "protoBytesDecoder": {
- "type": "file",
- "descriptor": "file:///tmp/metrics.desc",
- "protoMessageType": "Metrics"
- },
- "parseSpec": {
- "format": "json",
+"type": "kafka",
+"spec": {
+ "dataSchema": {
+ "dataSource": "metrics-protobuf",
"timestampSpec": {
- "column": "timestamp",
- "format": "auto"
+ "column": "timestamp",
+ "format": "auto"
},
"dimensionsSpec": {
- "dimensions": [
- "unit",
- "http_method",
- "http_code",
- "page",
- "metricType",
- "server"
- ],
- "dimensionExclusions": [
- "timestamp",
- "value"
- ]
+ "dimensions": [
+ "unit",
+ "http_method",
+ "http_code",
+ "page",
+ "metricType",
+ "server"
+ ],
+ "dimensionExclusions": [
+ "timestamp",
+ "value"
+ ]
+ },
+ "metricsSpec": [
+ {
+ "name": "count",
+ "type": "count"
+ },
+ {
+ "name": "value_sum",
+ "fieldName": "value",
+ "type": "doubleSum"
+ },
+ {
+ "name": "value_min",
+ "fieldName": "value",
+ "type": "doubleMin"
+ },
+ {
+ "name": "value_max",
+ "fieldName": "value",
+ "type": "doubleMax"
+ }
+ ],
+ "granularitySpec": {
+ "type": "uniform",
+ "segmentGranularity": "HOUR",
+ "queryGranularity": "NONE"
}
- }
},
- "metricsSpec": [
- {
- "name": "count",
- "type": "count"
- },
- {
- "name": "value_sum",
- "fieldName": "value",
- "type": "doubleSum"
- },
- {
- "name": "value_min",
- "fieldName": "value",
- "type": "doubleMin"
- },
- {
- "name": "value_max",
- "fieldName": "value",
- "type": "doubleMax"
- }
- ],
- "granularitySpec": {
- "type": "uniform",
- "segmentGranularity": "HOUR",
- "queryGranularity": "NONE"
- }
- },
- "tuningConfig": {
- "type": "kafka",
- "maxRowsPerSegment": 5000000
- },
- "ioConfig": {
- "topic": "metrics_pb",
- "consumerProperties": {
- "bootstrap.servers": "localhost:9092"
+ "tuningConfig": {
+ "type": "kafka",
+ "maxRowsPerSegment": 5000000
},
- "taskCount": 1,
- "replicas": 1,
- "taskDuration": "PT1H"
- }
+ "ioConfig": {
+ "topic": "metrics_pb",
+ "consumerProperties": {
+ "bootstrap.servers": "localhost:9092"
+ },
+ "inputFormat": {
+ "type": "protobuf",
+ "protoBytesDecoder": {
+ "type": "file",
+ "descriptor": "file:///tmp/metrics.desc",
+ "protoMessageType": "Metrics"
+ },
+ "flattenSpec": {
+ "useFieldDiscovery": true
+ },
+ "binaryAsString": false
+ },
+ "taskCount": 1,
+ "replicas": 1,
+ "taskDuration": "PT1H",
+ "type": "kafka"
+ }
+}
Review comment:
An extra `}` may have gotten added here.
##########
File path: docs/ingestion/data-formats.md
##########
@@ -422,11 +427,20 @@ Multiple Instances:
### Avro OCF
-> You need to include the
[`druid-avro-extensions`](../development/extensions-core/avro.md) as an
extension to use the Avro OCF input format.
+To load the Avro OCF input format, load the Druid Avro extension
([`druid-avro-extensions`](../development/extensions-core/avro.md)).
-> See the [Avro Types](../development/extensions-core/avro.md#avro-types)
section for how Avro types are handled in Druid
+See the [Avro Types](../development/extensions-core/avro.md#avro-types)
section for how Avro types are handled in Druid
-The `inputFormat` to load data of Avro OCF format. An example is:
+Configure the Avro OCF `inputFormat` to load Avro OCF data as follows:
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+|type| String| This should be set to `avro_ocf` to read Avro OCF file| yes |
+|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract
nested values from a Avro records. Note that only 'path' expression are
supported ('jq' is unavailable).| no (default will auto-discover 'root' level
properties) |
Review comment:
```suggestion
|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract
nested values from Avro records. Note that only 'path' expressions are
supported; 'jq' is unavailable.| no (default will auto-discover 'root' level
properties) |
```
##########
File path: docs/ingestion/data-formats.md
##########
@@ -498,18 +513,18 @@ The `inputFormat` to load data of Protobuf format. An
example is:
}
```
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-|type| String| This should be set to `protobuf` to read Protobuf serialized
data| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract
nested values from a Protobuf record. Note that only 'path' expression are
supported ('jq' is unavailable).| no (default will auto-discover 'root' level
properties) |
-|`protoBytesDecoder`| JSON Object |Specifies how to decode bytes to Protobuf
record. | yes |
-
### FlattenSpec
-The `flattenSpec` is located in `inputFormat` → `flattenSpec` and is
responsible for
-bridging the gap between potentially nested input data (such as JSON, Avro,
etc) and Druid's flat data model.
-An example `flattenSpec` is:
+The `flattenSpec` bridges the gap between potentially nested input data (such
as JSON, Avro, etc) and Druid's flat data model. It is a object within the
`inputFormat` object.
Review comment:
```suggestion
The `flattenSpec` bridges the gap between potentially nested input data
(such as JSON, Avro, etc) and Druid's flat data model. It is an object within
the `inputFormat` object.
```
##########
File path: docs/ingestion/native-batch.md
##########
@@ -772,10 +773,10 @@ The tuningConfig is optional and default parameters will
be used if no tuningCon
|numShards|Deprecated. Use `partitionsSpec` instead. Directly specify the
number of shards to create. If this is specified and `intervals` is specified
in the `granularitySpec`, the index task can skip the determine
intervals/partitions pass through the data. `numShards` cannot be specified if
`maxRowsPerSegment` is set.|null|no|
|partitionDimensions|Deprecated. Use `partitionsSpec` instead. The dimensions
to partition on. Leave blank to select all dimensions. Only used with
`forceGuaranteedRollup` = true, will be ignored otherwise.|null|no|
|partitionsSpec|Defines how to partition data in each timeChunk, see
[PartitionsSpec](#partitionsspec)|`dynamic` if `forceGuaranteedRollup` = false,
`hashed` if `forceGuaranteedRollup` = true|no|
-|indexSpec|Defines segment storage format options to be used at indexing time,
see [IndexSpec](index.md#indexspec)|null|no|
-|indexSpecForIntermediatePersists|Defines segment storage format options to be
used at indexing time for intermediate persisted temporary segments. this can
be used to disable dimension/metric compression on intermediate segments to
reduce memory required for final merging. however, disabling compression on
intermediate segments might increase page cache use while they are used before
getting merged into final segment published, see
[IndexSpec](index.md#indexspec) for possible values.|same as indexSpec|no|
+|indexSpec|Defines segment storage format options to be used at indexing time,
see [IndexSpec](ingestion-spec.md#indexspec)|null|no|
+|indexSpecForIntermediatePersists|Defines segment storage format options to be
used at indexing time for intermediate persisted temporary segments. this can
be used to disable dimension/metric compression on intermediate segments to
reduce memory required for final merging. however, disabling compression on
intermediate segments might increase page cache use while they are used before
getting merged into final segment published, see
[IndexSpec](ingestion-spec.md#indexspec) for possible values.|same as
indexSpec|no|
Review comment:
```suggestion
|indexSpecForIntermediatePersists|Defines segment storage format options to
be used at indexing time for intermediate persisted temporary segments. This
can be used to disable dimension/metric compression on intermediate segments to
reduce memory required for final merging. However, disabling compression on
intermediate segments might increase page cache use while they are used before
getting merged into final segment published. See
[`IndexSpec`](ingestion-spec.md#indexspec) for possible values.|same as
`indexSpec`|no|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]