techdocsmith commented on a change in pull request #11428:
URL: https://github.com/apache/druid/pull/11428#discussion_r669774384



##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -23,27 +23,52 @@ title: "Multi-value dimensions"
   -->
 
 
-Apache Druid supports "multi-value" string dimensions. These are generated 
when an input field contains an
-array of values instead of a single value (e.g. JSON arrays, or a TSV field 
containing one or more `listDelimiter`
-characters). By default Druid ingests the values in alphabetical order, see 
[Dimension Objects](../ingestion/index.md#dimension-objects) for configuration.
+Apache Druid supports "multi-value" string dimensions, which result from input 
fields that contain an
+array of values instead of a single value. 

Review comment:
       Might be good to have an example input -> Druid here (setup for tags 
example later?)

##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -23,27 +23,52 @@ title: "Multi-value dimensions"
   -->
 
 
-Apache Druid supports "multi-value" string dimensions. These are generated 
when an input field contains an
-array of values instead of a single value (e.g. JSON arrays, or a TSV field 
containing one or more `listDelimiter`
-characters). By default Druid ingests the values in alphabetical order, see 
[Dimension Objects](../ingestion/index.md#dimension-objects) for configuration.
+Apache Druid supports "multi-value" string dimensions, which result from input 
fields that contain an
+array of values instead of a single value. 
 
-This document describes the behavior of groupBy (topN has similar behavior) 
queries on multi-value dimensions when they
-are used as a dimension being grouped by. See the section on multi-value 
columns in
-[segments](../design/segments.md#multi-value-columns) for internal 
representation details. Examples in this document
+This document describes filtering and grouping behavior for multi-value 
dimensions. For information about the internal representation of multi-value 
dimensions, see
+[segments documentation](../design/segments.md#multi-value-columns). Examples 
in this document
 are in the form of [native Druid queries](querying.md). Refer to the [Druid 
SQL documentation](sql.md) for details
 about using multi-value string dimensions in SQL.
 
+## Overview
+
+At ingestion time, Druid can detect multi-value dimensions and configure the 
`dimensionsSpec` accordingly. It detects JSON arrays or CSV/TSV fields as 
multi-value dimensions.
+
+For TSV or CSV data, you can specify the multi-value delimiters using the 
`listDelimiter` field in the `parseSpec`. JSON data must be formatted as a JSON 
array to be ingested as a multi-value dimension. JSON data does not require 
`parseSpec` configuration.
+
+The following shows an example multi-value dimension named `tags` in a 
`dimensionsSpec`:
+
+```
+"dimensions": [
+  {
+    "type": "string",
+    "name": "tags",
+    "multiValueHandling": "SORTED_ARRAY",
+    "createBitmapIndex": true
+  }
+],
+```
+
+By default, Druid sorts values in multi-value dimensions. This behavior is 
controlled by the `SORTED_ARRAY` value of the `multiValueHandling` field. 
Alternatively, you can specify multi-value handling as:
+
+* `SORTED_SET`: results in the removal of duplicate values
+* `ARRAY`: retains the original order of the values
+
+See [Dimension Objects](../ingestion/index.md#dimension-objects) for 
information on configuring multi-value handling.
+
+
 ## Querying multi-value dimensions
 
-Suppose, you have a dataSource with a segment that contains the following 
rows, with a multi-value dimension
-called `tags`.
+The following sections describe filtering and grouping behavior based on the 
following example data, which includes a multi-value dimension, `tags`.
 
 ```
 {"timestamp": "2011-01-12T00:00:00.000Z", "tags": ["t1","t2","t3"]}  #row1

Review comment:
       todo for later prefer a "real-world" example over dummy tags.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to