[GitHub] [druid] sthetland commented on a change in pull request #11541: Docs Ingestion page refactor

GitBox Tue, 10 Aug 2021 17:29:36 -0700


sthetland commented on a change in pull request #11541:
URL: https://github.com/apache/druid/pull/11541#discussion_r686386422




##########
File path: docs/ingestion/index.md
##########
@@ -59,6 +50,8 @@ The most recommended, and most popular, method of streaming 
ingestion is the
 [Kafka indexing service](../development/extensions-core/kafka-ingestion.md) 
that reads directly from Kafka. Alternatively, the Kinesis
 indexing service works with Amazon Kinesis Data Streams.
 
+Streaming ingestion uses an onging process called a supervisor that reads from 
the data stream to ingest data into Druid.

Review comment:
       ```suggestion
   Streaming ingestion uses an ongoing process called a supervisor, which reads 
from the data stream to ingest data into Druid.
   ```
   Or "Streaming ingestion uses an ongoing process called a supervisor, which 
ingests data into Druid by reading from data streams."
   
   

##########
File path: docs/ingestion/data-formats.md
##########
@@ -215,21 +218,22 @@ The `inputFormat` to load data of Parquet format. An 
example is:
 }
 ```
 
-The Parquet `inputFormat` has the following components:
+### Avro Stream
 
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-|type| String| This should be set to `parquet` to read Parquet file| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Parquet file. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |
-| binaryAsString | Boolean | Specifies if the bytes parquet column which is 
not logically marked as a string or enum type should be treated as a UTF-8 
encoded string. | no (default = false) |
+To use the Avro Stream input format load the Druid Avro extension 
([`druid-avro-extensions`](../development/extensions-core/avro.md)).
 
-### Avro Stream
+For more informtion on how Druid handles Avro types, see [Avro 
Types](../development/extensions-core/avro.md#avro-types) section for
 
-> You need to include the 
[`druid-avro-extensions`](../development/extensions-core/avro.md) as an 
extension to use the Avro Stream input format.
+Configure the Avro `inputFormat` to load Avro data as follows:
 
-> See the [Avro Types](../development/extensions-core/avro.md#avro-types) 
section for how Avro types are handled in Druid
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+|type| String| This should be set to `avro_stream` to read Avro serialized 
data| yes |
+|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Avro record. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |

Review comment:
       ```suggestion
   |flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from an Avro record. Note that only 'path' expressions are 
supported; 'jq' is unavailable.| no (default will auto-discover 'root' level 
properties) |
   ```

##########
File path: docs/querying/query-context.md
##########
@@ -61,6 +61,7 @@ Unless otherwise noted, the following parameters apply to all 
query types.
 |useFilterCNF|`false`| If true, Druid will attempt to convert the query filter 
to Conjunctive Normal Form (CNF). During query processing, columns can be 
pre-filtered by intersecting the bitmap indexes of all values that match the 
eligible filters, often greatly reducing the raw number of rows which need to 
be scanned. But this effect only happens for the top level filter, or 
individual clauses of a top level 'and' filter. As such, filters in CNF 
potentially have a higher chance to utilize a large amount of bitmap indexes on 
string columns during pre-filtering. However, this setting should be used with 
great caution, as it can sometimes have a negative effect on performance, and 
in some cases, the act of computing CNF of a filter can be expensive. We 
recommend hand tuning your filters to produce an optimal form if possible, or 
at least verifying through experimentation that using this parameter actually 
improves your query performance with no ill-effects.|
 |secondaryPartitionPruning|`true`|Enable secondary partition pruning on the 
Broker. The Broker will always prune unnecessary segments from the input scan 
based on a filter on time intervals, but if the data is further partitioned 
with hash or range partitioning, this option will enable additional pruning 
based on a filter on secondary partition dimensions.|
 |enableJoinLeftTableScanDirect|`false`|This flag applies to queries which have 
joins. For joins, where left child is a simple scan with a filter,  by default, 
druid will run the scan as a query and the join the results to the right child 
on broker. Setting this flag to true overrides that behavior and druid will 
attempt to push the join to data servers instead. Please note that the flag 
could be applicable to queries even if there is no explicit join. since queries 
can internally translated into a join by the SQL planner.|
+|debug| `false` | Flag indicating whether to enable debugging outputs for the 
query. When set to false, no additional logs will be produced (logs produced 
will be entirely dependent on your logging level). When set to true, the 
following addition logs will be produced:<br />- Log the stack trace of the 
exception (if any) produced by the query |

Review comment:
       ```suggestion
   |debug| `false` | Flag indicating whether to enable debugging outputs for 
the query. When set to false, no additional logs will be produced. (Logs 
produced will be entirely dependent on your logging level.) When set to true, 
the following additional logs will be produced:<br />- Log the stack trace of 
the exception (if any) produced by the query |
   ```

##########
File path: docs/ingestion/ingestion-spec.md
##########
@@ -0,0 +1,464 @@
+---
+id: ingestion-spec
+title: Ingestion spec reference
+sidebar_label: Ingestion spec
+description: Reference for the configuration options in the ingestion spec.
+---
+
+All ingestion methods use ingestion tasks to load data into Druid. Streaming 
ingestion uses ongoing supervisors that run and supervise a set of tasks over 
time. Native batch and Hadoop-based ingestion use a one-time [task](tasks.md). 
All types of ingestion use an _ingestion spec_ to configure ingestion.
+
+Ingestion specs consists of three main components:
+
+- [`dataSchema`](#dataschema), which configures the [datasource 
name](#datasource),
+   [primary timestamp](#timestampspec), [dimensions](#dimensionsspec), 
[metrics](#metricsspec), and [transforms and filters](#transformspec) (if 
needed).
+- [`ioConfig`](#ioconfig), which tells Druid how to connect to the source 
system and how to parse data. For more information, see the
+   documentation for each [ingestion method](./index.md#ingestion-methods).
+- [`tuningConfig`](#tuningconfig), which controls various tuning parameters 
specific to each
+  [ingestion method](./index.md#ingestion-methods).
+
+Example ingestion spec for task type `index_parallel` (native batch):
+
+```
+{
+  "type": "index_parallel",
+  "spec": {
+    "dataSchema": {
+      "dataSource": "wikipedia",
+      "timestampSpec": {
+        "column": "timestamp",
+        "format": "auto"
+      },
+      "dimensionsSpec": {
+        "dimensions": [
+          { "type": "string", "page" },
+          { "type": "string", "language" },
+          { "type": "long", "name": "userId" }
+        ]
+      },
+      "metricsSpec": [
+        { "type": "count", "name": "count" },
+        { "type": "doubleSum", "name": "bytes_added_sum", "fieldName": 
"bytes_added" },
+        { "type": "doubleSum", "name": "bytes_deleted_sum", "fieldName": 
"bytes_deleted" }
+      ],
+      "granularitySpec": {
+        "segmentGranularity": "day",
+        "queryGranularity": "none",
+        "intervals": [
+          "2013-08-31/2013-09-01"
+        ]
+      }
+    },
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "local",
+        "baseDir": "examples/indexing/",
+        "filter": "wikipedia_data.json"
+      },
+      "inputFormat": {
+        "type": "json",
+        "flattenSpec": {
+          "useFieldDiscovery": true,
+          "fields": [
+            { "type": "path", "name": "userId", "expr": "$.user.id" }
+          ]
+        }
+      }
+    },
+    "tuningConfig": {
+      "type": "index_parallel"
+    }
+  }
+}
+```
+
+The specific options supported by these sections will depend on the [ingestion 
method](./index.md#ingestion-methods) you have chosen.
+For more examples, refer to the documentation for each ingestion method.
+
+You can also load data visually, without the need to write an ingestion spec, 
using the "Load data" functionality
+available in Druid's [web console](../operations/druid-console.md). Druid's 
visual data loader supports
+[Kafka](../development/extensions-core/kafka-ingestion.md),
+[Kinesis](../development/extensions-core/kinesis-ingestion.md), and
+[native batch](native-batch.md) mode.
+
+## `dataSchema`
+
+> The `dataSchema` spec has been changed in 0.17.0. The new spec is supported 
by all ingestion methods
+except for _Hadoop_ ingestion. See the [Legacy `dataSchema` 
spec](#legacy-dataschema-spec) for the old spec.
+
+The `dataSchema` is a holder for the following components:
+
+- [datasource name](#datasource)
+- [primary timestamp](#timestampspec)
+- [dimensions](#dimensionsspec)
+- [metrics](#metricsspec)
+- [transforms and filters](#transformspec) (if needed).
+
+An example `dataSchema` is:
+
+```
+"dataSchema": {
+  "dataSource": "wikipedia",
+  "timestampSpec": {
+    "column": "timestamp",
+    "format": "auto"
+  },
+  "dimensionsSpec": {
+    "dimensions": [
+      { "type": "string", "page" },
+      { "type": "string", "language" },
+      { "type": "long", "name": "userId" }
+    ]
+  },
+  "metricsSpec": [
+    { "type": "count", "name": "count" },
+    { "type": "doubleSum", "name": "bytes_added_sum", "fieldName": 
"bytes_added" },
+    { "type": "doubleSum", "name": "bytes_deleted_sum", "fieldName": 
"bytes_deleted" }
+  ],
+  "granularitySpec": {
+    "segmentGranularity": "day",
+    "queryGranularity": "none",
+    "intervals": [
+      "2013-08-31/2013-09-01"
+    ]
+  }
+}
+```
+
+### `dataSource`
+
+The `dataSource` is located in `dataSchema` → `dataSource` and is simply the 
name of the
+[datasource](../design/architecture.md#datasources-and-segments) that data 
will be written to. An example
+`dataSource` is:
+
+```
+"dataSource": "my-first-datasource"
+```
+
+### `timestampSpec`
+
+The `timestampSpec` is located in `dataSchema` → `timestampSpec` and is 
responsible for
+configuring the [primary timestamp](./data-model.md#primary-timestamp). An 
example `timestampSpec` is:
+
+```
+"timestampSpec": {
+  "column": "timestamp",
+  "format": "auto"
+}
+```
+
+> Conceptually, after input data records are read, Druid applies ingestion 
spec components in a particular order:
+> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then 
[`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
+> and finally [`dimensionsSpec`](#dimensionsspec) and 
[`metricsSpec`](#metricsspec). Keep this in mind when writing
+> your ingestion spec.
+
+A `timestampSpec` can have the following components:
+
+|Field|Description|Default|
+|-----|-----------|-------|
+|column|Input row field to read the primary timestamp from.<br><br>Regardless 
of the name of this input field, the primary timestamp will always be stored as 
a column named `__time` in your Druid datasource.|timestamp|
+|format|Timestamp format. Options are: <ul><li>`iso`: ISO8601 with 'T' 
separator, like "2000-01-01T01:02:03.456"</li><li>`posix`: seconds since 
epoch</li><li>`millis`: milliseconds since epoch</li><li>`micro`: microseconds 
since epoch</li><li>`nano`: nanoseconds since epoch</li><li>`auto`: 
automatically detects ISO (either 'T' or space separator) or millis 
format</li><li>any [Joda DateTimeFormat 
string](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html)</li></ul>|auto|
+|missingValue|Timestamp to use for input records that have a null or missing 
timestamp `column`. Should be in ISO8601 format, like 
`"2000-01-01T01:02:03.456"`, even if you have specified something else for 
`format`. Since Druid requires a primary timestamp, this setting can be useful 
for ingesting datasets that do not have any per-record timestamps at all. |none|
+
+### `dimensionsSpec`
+
+The `dimensionsSpec` is located in `dataSchema` → `dimensionsSpec` and is 
responsible for
+configuring [dimensions](./data-model.md#dimensions). An example 
`dimensionsSpec` is:
+
+```
+"dimensionsSpec" : {
+  "dimensions": [
+    "page",
+    "language",
+    { "type": "long", "name": "userId" }
+  ],
+  "dimensionExclusions" : [],
+  "spatialDimensions" : []
+}
+```
+
+> Conceptually, after input data records are read, Druid applies ingestion 
spec components in a particular order:
+> first [`flattenSpec`](data-formats.md#flattenspec) (if any), then 
[`timestampSpec`](#timestampspec), then [`transformSpec`](#transformspec),
+> and finally [`dimensionsSpec`](#dimensionsspec) and 
[`metricsSpec`](#metricsspec). Keep this in mind when writing
+> your ingestion spec.
+
+A `dimensionsSpec` can have the following components:
+
+| Field | Description | Default |
+|-------|-------------|---------|
+| dimensions | A list of [dimension names or objects](#dimension-objects). 
Cannot have the same column in both `dimensions` and 
`dimensionExclusions`.<br><br>If this and `spatialDimensions` are both null or 
empty arrays, Druid will treat all non-timestamp, non-metric columns that do 
not appear in `dimensionExclusions` as String-typed dimension columns. See 
[inclusions and exclusions](#inclusions-and-exclusions) below for details. | 
`[]` |
+| dimensionExclusions | The names of dimensions to exclude from ingestion. 
Only names are supported here, not objects.<br><br>This list is only used if 
the `dimensions` and `spatialDimensions` lists are both null or empty arrays; 
otherwise it is ignored. See [inclusions and 
exclusions](#inclusions-and-exclusions) below for details. | `[]` |
+| spatialDimensions | An array of [spatial dimensions](../development/geo.md). 
| `[]` |
+
+#### Dimension objects
+
+Each dimension in the `dimensions` list can either be a name or an object. 
Providing a name is equivalent to providing
+a `string` type dimension object with the given name, e.g. `"page"` is 
equivalent to `{"name": "page", "type": "string"}`.
+
+Dimension objects can have the following components:
+
+| Field | Description | Default |
+|-------|-------------|---------|
+| type | Either `string`, `long`, `float`, or `double`. | `string` |
+| name | The name of the dimension. This will be used as the field name to 
read from input records, as well as the column name stored in generated 
segments.<br><br>Note that you can use a [`transformSpec`](#transformspec) if 
you want to rename columns during ingestion time. | none (required) |
+| createBitmapIndex | For `string` typed dimensions, whether or not bitmap 
indexes should be created for the column in generated segments. Creating a 
bitmap index requires more storage, but speeds up certain kinds of filtering 
(especially equality and prefix filtering). Only supported for `string` typed 
dimensions. | `true` |
+| multiValueHandling | Specify the type of handling for [multi-value 
fields](../querying/multi-value-dimensions.md). Possible values are 
`sorted_array`, `sorted_set`, and `array`. `sorted_array` and `sorted_set` 
order the array upon ingestion. `sorted_set` removes duplicates. `array` 
ingests data as-is | `sorted_array` |
+
+#### Inclusions and exclusions
+
+Druid will interpret a `dimensionsSpec` in two possible ways: _normal_ or 
_schemaless_.
+
+Normal interpretation occurs when either `dimensions` or `spatialDimensions` 
is non-empty. In this case, the combination of the two lists will be taken as 
the set of dimensions to be ingested, and the list of `dimensionExclusions` 
will be ignored.
+
+Schemaless interpretation occurs when both `dimensions` and 
`spatialDimensions` are empty or null. In this case, the set of dimensions is 
determined in the following way:
+
+1. First, start from the set of all root-level fields from the input record, 
as determined by the [`inputFormat`](./data-formats.md). "Root-level" includes 
all fields at the top level of a data structure, but does not included fields 
nested within maps or lists. To extract these, you must use a 
[`flattenSpec`](./data-formats.md#flattenspec). All fields of non-nested data 
formats, such as CSV and delimited text, are considered root-level.
+2. If a [`flattenSpec`](./data-formats.md#flattenspec) is being used, the set 
of root-level fields includes any fields generated by the flattenSpec. The 
useFieldDiscovery parameter determines whether the original root-level fields 
will be retained or discarded.

Review comment:
       ```suggestion
   2. If a [`flattenSpec`](./data-formats.md#flattenspec) is being used, the 
set of root-level fields includes any fields generated by the `flattenSpec`. 
The `useFieldDiscovery` parameter determines whether the original root-level 
fields will be retained or discarded.
   ```

##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -22,9 +22,21 @@ title: "Multi-value dimensions"
   ~ under the License.
   -->
 
+<<<<<<< HEAD

Review comment:
       uh oh, this looks familiar. I think lines 27 through 39 can just be 
replaced with 
   
   ```
   Apache Druid supports "multi-value" string dimensions. These are generated 
when an input field contains an
   array of values instead of a single value (e.g. JSON arrays, or a TSV field 
containing one or more `listDelimiter` characters). 
   
   By default Druid stores the values in alphabetical order, see [Dimension 
Objects](../ingestion/ingestion-spec.md#dimension-objects) for configuration.
   ```

##########
File path: docs/development/extensions-core/protobuf.md
##########
@@ -112,82 +112,86 @@ Important supervisor properties
 - `protoBytesDecoder.descriptor` for the descriptor file URL
 - `protoBytesDecoder.protoMessageType` from the proto definition
 - `protoBytesDecoder.type` set to `file`, indicate use descriptor file to 
decode Protobuf file
-- `parser` should have `type` set to `protobuf`, but note that the `format` of 
the `parseSpec` must be `json`
+- `inputFormat` should have `type` set to `protobuf`
 
 ```json
 {
-  "type": "kafka",
-  "dataSchema": {
-    "dataSource": "metrics-protobuf",
-    "parser": {
-      "type": "protobuf",
-      "protoBytesDecoder": {
-        "type": "file",
-        "descriptor": "file:///tmp/metrics.desc",
-        "protoMessageType": "Metrics"
-      },
-      "parseSpec": {
-        "format": "json",
+"type": "kafka",
+"spec": {
+    "dataSchema": {
+        "dataSource": "metrics-protobuf",
         "timestampSpec": {
-          "column": "timestamp",
-          "format": "auto"
+            "column": "timestamp",
+            "format": "auto"
         },
         "dimensionsSpec": {
-          "dimensions": [
-            "unit",
-            "http_method",
-            "http_code",
-            "page",
-            "metricType",
-            "server"
-          ],
-          "dimensionExclusions": [
-            "timestamp",
-            "value"
-          ]
+            "dimensions": [
+                "unit",
+                "http_method",
+                "http_code",
+                "page",
+                "metricType",
+                "server"
+            ],
+            "dimensionExclusions": [
+                "timestamp",
+                "value"
+            ]
+        },
+        "metricsSpec": [
+            {
+                "name": "count",
+                "type": "count"
+            },
+            {
+                "name": "value_sum",
+                "fieldName": "value",
+                "type": "doubleSum"
+            },
+            {
+                "name": "value_min",
+                "fieldName": "value",
+                "type": "doubleMin"
+            },
+            {
+                "name": "value_max",
+                "fieldName": "value",
+                "type": "doubleMax"
+            }
+        ],
+        "granularitySpec": {
+            "type": "uniform",
+            "segmentGranularity": "HOUR",
+            "queryGranularity": "NONE"
         }
-      }
     },
-    "metricsSpec": [
-      {
-        "name": "count",
-        "type": "count"
-      },
-      {
-        "name": "value_sum",
-        "fieldName": "value",
-        "type": "doubleSum"
-      },
-      {
-        "name": "value_min",
-        "fieldName": "value",
-        "type": "doubleMin"
-      },
-      {
-        "name": "value_max",
-        "fieldName": "value",
-        "type": "doubleMax"
-      }
-    ],
-    "granularitySpec": {
-      "type": "uniform",
-      "segmentGranularity": "HOUR",
-      "queryGranularity": "NONE"
-    }
-  },
-  "tuningConfig": {
-    "type": "kafka",
-    "maxRowsPerSegment": 5000000
-  },
-  "ioConfig": {
-    "topic": "metrics_pb",
-    "consumerProperties": {
-      "bootstrap.servers": "localhost:9092"
+    "tuningConfig": {
+        "type": "kafka",
+        "maxRowsPerSegment": 5000000
     },
-    "taskCount": 1,
-    "replicas": 1,
-    "taskDuration": "PT1H"
-  }
+    "ioConfig": {
+        "topic": "metrics_pb",
+        "consumerProperties": {
+            "bootstrap.servers": "localhost:9092"
+        },
+        "inputFormat": {
+            "type": "protobuf",
+            "protoBytesDecoder": {
+                "type": "file",
+                "descriptor": "file:///tmp/metrics.desc",
+                "protoMessageType": "Metrics"
+            },
+            "flattenSpec": {
+                "useFieldDiscovery": true
+            },
+            "binaryAsString": false
+        },
+        "taskCount": 1,
+        "replicas": 1,
+        "taskDuration": "PT1H",
+        "type": "kafka"
+    }
+}

Review comment:
        An extra `}` may have gotten added here. 

##########
File path: docs/ingestion/data-formats.md
##########
@@ -422,11 +427,20 @@ Multiple Instances:
 
 ### Avro OCF
 
-> You need to include the 
[`druid-avro-extensions`](../development/extensions-core/avro.md) as an 
extension to use the Avro OCF input format.
+To load the Avro OCF input format, load the Druid Avro extension 
([`druid-avro-extensions`](../development/extensions-core/avro.md)).
 
-> See the [Avro Types](../development/extensions-core/avro.md#avro-types) 
section for how Avro types are handled in Druid
+See the [Avro Types](../development/extensions-core/avro.md#avro-types) 
section for how Avro types are handled in Druid
 
-The `inputFormat` to load data of Avro OCF format. An example is:
+Configure the Avro OCF `inputFormat` to load Avro OCF data as follows:
+
+| Field | Type | Description | Required |
+|-------|------|-------------|----------|
+|type| String| This should be set to `avro_ocf` to read Avro OCF file| yes |
+|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Avro records. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |

Review comment:
       ```suggestion
   |flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from Avro records. Note that only 'path' expressions are 
supported; 'jq' is unavailable.| no (default will auto-discover 'root' level 
properties) |
   ```

##########
File path: docs/ingestion/data-formats.md
##########
@@ -498,18 +513,18 @@ The `inputFormat` to load data of Protobuf format. An 
example is:
 }
 ```
 
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-|type| String| This should be set to `protobuf` to read Protobuf serialized 
data| yes |
-|flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract 
nested values from a Protobuf record. Note that only 'path' expression are 
supported ('jq' is unavailable).| no (default will auto-discover 'root' level 
properties) |
-|`protoBytesDecoder`| JSON Object |Specifies how to decode bytes to Protobuf 
record. | yes |
-
 ### FlattenSpec
 
-The `flattenSpec` is located in `inputFormat` → `flattenSpec` and is 
responsible for
-bridging the gap between potentially nested input data (such as JSON, Avro, 
etc) and Druid's flat data model.
-An example `flattenSpec` is:
+The `flattenSpec` bridges the gap between potentially nested input data (such 
as JSON, Avro, etc) and Druid's flat data model. It is a object within the 
`inputFormat` object.

Review comment:
       ```suggestion
   The `flattenSpec` bridges the gap between potentially nested input data 
(such as JSON, Avro, etc) and Druid's flat data model. It is an object within 
the `inputFormat` object.
   ```

##########
File path: docs/ingestion/native-batch.md
##########
@@ -772,10 +773,10 @@ The tuningConfig is optional and default parameters will 
be used if no tuningCon
 |numShards|Deprecated. Use `partitionsSpec` instead. Directly specify the 
number of shards to create. If this is specified and `intervals` is specified 
in the `granularitySpec`, the index task can skip the determine 
intervals/partitions pass through the data. `numShards` cannot be specified if 
`maxRowsPerSegment` is set.|null|no|
 |partitionDimensions|Deprecated. Use `partitionsSpec` instead. The dimensions 
to partition on. Leave blank to select all dimensions. Only used with 
`forceGuaranteedRollup` = true, will be ignored otherwise.|null|no|
 |partitionsSpec|Defines how to partition data in each timeChunk, see 
[PartitionsSpec](#partitionsspec)|`dynamic` if `forceGuaranteedRollup` = false, 
`hashed` if `forceGuaranteedRollup` = true|no|
-|indexSpec|Defines segment storage format options to be used at indexing time, 
see [IndexSpec](index.md#indexspec)|null|no|
-|indexSpecForIntermediatePersists|Defines segment storage format options to be 
used at indexing time for intermediate persisted temporary segments. this can 
be used to disable dimension/metric compression on intermediate segments to 
reduce memory required for final merging. however, disabling compression on 
intermediate segments might increase page cache use while they are used before 
getting merged into final segment published, see 
[IndexSpec](index.md#indexspec) for possible values.|same as indexSpec|no|
+|indexSpec|Defines segment storage format options to be used at indexing time, 
see [IndexSpec](ingestion-spec.md#indexspec)|null|no|
+|indexSpecForIntermediatePersists|Defines segment storage format options to be 
used at indexing time for intermediate persisted temporary segments. this can 
be used to disable dimension/metric compression on intermediate segments to 
reduce memory required for final merging. however, disabling compression on 
intermediate segments might increase page cache use while they are used before 
getting merged into final segment published, see 
[IndexSpec](ingestion-spec.md#indexspec) for possible values.|same as 
indexSpec|no|

Review comment:
       ```suggestion
   |indexSpecForIntermediatePersists|Defines segment storage format options to 
be used at indexing time for intermediate persisted temporary segments. This 
can be used to disable dimension/metric compression on intermediate segments to 
reduce memory required for final merging. However, disabling compression on 
intermediate segments might increase page cache use while they are used before 
getting merged into final segment published. See 
[`IndexSpec`](ingestion-spec.md#indexspec) for possible values.|same as 
`indexSpec`|no|
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] sthetland commented on a change in pull request #11541: Docs Ingestion page refactor

Reply via email to