techdocsmith commented on a change in pull request #11428:
URL: https://github.com/apache/druid/pull/11428#discussion_r669774384
##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -23,27 +23,52 @@ title: "Multi-value dimensions"
-->
-Apache Druid supports "multi-value" string dimensions. These are generated
when an input field contains an
-array of values instead of a single value (e.g. JSON arrays, or a TSV field
containing one or more `listDelimiter`
-characters). By default Druid ingests the values in alphabetical order, see
[Dimension Objects](../ingestion/index.md#dimension-objects) for configuration.
+Apache Druid supports "multi-value" string dimensions, which result from input
fields that contain an
+array of values instead of a single value.
Review comment:
Might be good to have an example input -> Druid here (setup for tags
example later?)
##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -23,27 +23,52 @@ title: "Multi-value dimensions"
-->
-Apache Druid supports "multi-value" string dimensions. These are generated
when an input field contains an
-array of values instead of a single value (e.g. JSON arrays, or a TSV field
containing one or more `listDelimiter`
-characters). By default Druid ingests the values in alphabetical order, see
[Dimension Objects](../ingestion/index.md#dimension-objects) for configuration.
+Apache Druid supports "multi-value" string dimensions, which result from input
fields that contain an
+array of values instead of a single value.
-This document describes the behavior of groupBy (topN has similar behavior)
queries on multi-value dimensions when they
-are used as a dimension being grouped by. See the section on multi-value
columns in
-[segments](../design/segments.md#multi-value-columns) for internal
representation details. Examples in this document
+This document describes filtering and grouping behavior for multi-value
dimensions. For information about the internal representation of multi-value
dimensions, see
+[segments documentation](../design/segments.md#multi-value-columns). Examples
in this document
are in the form of [native Druid queries](querying.md). Refer to the [Druid
SQL documentation](sql.md) for details
about using multi-value string dimensions in SQL.
+## Overview
+
+At ingestion time, Druid can detect multi-value dimensions and configure the
`dimensionsSpec` accordingly. It detects JSON arrays or CSV/TSV fields as
multi-value dimensions.
+
+For TSV or CSV data, you can specify the multi-value delimiters using the
`listDelimiter` field in the `parseSpec`. JSON data must be formatted as a JSON
array to be ingested as a multi-value dimension. JSON data does not require
`parseSpec` configuration.
+
+The following shows an example multi-value dimension named `tags` in a
`dimensionsSpec`:
+
+```
+"dimensions": [
+ {
+ "type": "string",
+ "name": "tags",
+ "multiValueHandling": "SORTED_ARRAY",
+ "createBitmapIndex": true
+ }
+],
+```
+
+By default, Druid sorts values in multi-value dimensions. This behavior is
controlled by the `SORTED_ARRAY` value of the `multiValueHandling` field.
Alternatively, you can specify multi-value handling as:
+
+* `SORTED_SET`: results in the removal of duplicate values
+* `ARRAY`: retains the original order of the values
+
+See [Dimension Objects](../ingestion/index.md#dimension-objects) for
information on configuring multi-value handling.
+
+
## Querying multi-value dimensions
-Suppose, you have a dataSource with a segment that contains the following
rows, with a multi-value dimension
-called `tags`.
+The following sections describe filtering and grouping behavior based on the
following example data, which includes a multi-value dimension, `tags`.
```
{"timestamp": "2011-01-12T00:00:00.000Z", "tags": ["t1","t2","t3"]} #row1
Review comment:
todo for later prefer a "real-world" example over dummy tags.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]