Re: [PR] better documentation for the differences between arrays and mvds (druid)

via GitHub Thu, 26 Oct 2023 01:09:55 -0700


gianm commented on code in PR #15245:
URL: https://github.com/apache/druid/pull/15245#discussion_r1372301620



##########
docs/multi-stage-query/concepts.md:
##########
@@ -88,6 +88,9 @@ When deciding whether to use `REPLACE` or `INSERT`, keep in 
mind that segments g
 with dimension-based pruning but those generated with `INSERT` cannot. For 
more information about the requirements
 for dimension-based pruning, see [Clustering](#clustering).
 
+To insert [ARRAY types](../querying/arrays.md), be sure to set context flag 
`"arrayIngestMode":"array"` which allows

Review Comment:
   Hmm this seems like the wrong place to put this. It's generic docs about 
`INSERT`, we don't want to gunk it up with stuff about specific data that might 
be inserted. (Otherwise this would be, like, 10 times longer.)
   
   I suggest cutting it, and relying on the examples and the array docs to 
guide people.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"

Review Comment:
   "Arrays" is a better title and scope — as people can use arrays even if they 
don't have array columns. For example they can use `MV_TO_ARRAY`, `ARRAY_AGG`, 
etc.



##########
docs/multi-stage-query/concepts.md:
##########
@@ -192,10 +195,13 @@ To perform ingestion with rollup:
 2. Set [`finalizeAggregations: false`](reference.md#context-parameters) in 
your context. This causes aggregation
    functions to write their internal state to the generated segments, instead 
of the finalized end result, and enables
    further aggregation at query time.
-3. Wrap all multi-value strings in `MV_TO_ARRAY(...)` and set 
[`groupByEnableMultiValueUnnesting:
-   false`](reference.md#context-parameters) in your context. This ensures that 
multi-value strings are left alone and
-   remain lists, instead of being [automatically 
unnested](../querying/sql-data-types.md#multi-value-strings) by the
-   `GROUP BY` operator.
+3. To ingest [Druid multi-value 
dimensions](../querying/multi-value-dimensions.md), wrap all multi-value 
strings 

Review Comment:
   This direction has become too complicated for people to understand, so I 
think we'll need an example.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.

Review Comment:
   Sort of unclear what the verb "include" refers to. The sentence construction 
makes it sound like the arrays themselves must include the context parameter. 
But that isn't right. Also, "multi-stage ingestion" isn't a thing 🙂— it's 
"SQL-based ingestion" or "multi-stage query".
   
   So, suggestion:
   
   > Arrays can also be inserted with [SQL-based 
ingestion](../multi-stage-query/index.md) when you use the context parameter 
`"arrayIngestMode": "array"`.
   
   Also link the text `context parameter` to 
`docs/multi-stage-query/reference.md#context`.
   
   Also include some text about what will happen if you _don't_ do 
`arrayIngestMode: array`. Something like: string arrays will be converted to 
multi-value dimensions, and numeric arrays will cause the query to fail with an 
error (what error?)



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS
+
+### Filtering
+
+All query types, as well as [filtered 
aggregators](aggregations.md#filtered-aggregator), can filter on array typed 
columns. Filters follow these rules for array types:
+
+- Value filters, like "equality", "range" match on entire array values
+- The "null" filter will match rows where the entire array value is null
+- Array specific functions like ARRAY_CONTAINS and ARRAY_OVERLAP follow the 
behavior specified by those functions
+- All other filters do not directly support ARRAY types
+
+#### Example: equality
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayLong = ARRAY[1,2,3]
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row3","arrayString":"[]","arrayLong":"[1,2,3]","arrayDouble":"[null,2.2,1.1]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+#### Example: null
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayLong is null

Review Comment:
   style nit: capitalize SQL keywords like `IS NULL`



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```

Review Comment:
   Somewhere around here we should have a section "String arrays vs. 
multi-value dimensions" that sets people straight about the differences. 
Suggested text:
   
   > Avoid confusing string arrays with multi-value dimensions (link to MVD 
docs). Arrays and multi-value dimensions are stored in different column types, 
and query behavior is different. You can use the functions `MV_TO_ARRAY` and 
`ARRAY_TO_MV` to convert between the two if needed. In general, we recommend 
using arrays whenever possible, since they are a newer and more powerful 
feature.
   >
   > Use care during ingestion to ensure you get the type you want.
   >
   > To get arrays when performing an ingestion using JSON ingestion specs, 
such as native batch (link) or streaming ingestion (link), use dimension type 
`auto` or enable `useSchemaDiscovery`. When performing a SQL-based ingestion, 
write a query that generates arrays and set the context parameter 
`arrayIngestMode: array`.
   > 
   > To get multi-value dimensions when performing an ingestion using JSON 
ingestion specs, use dimension type `string` and do not enable 
`useSchemaDiscovery`. When performing a SQL-based ingestion, wrap arrays in 
`ARRAY_TO_MV` (link to examples). Note that multi-value dimensions must be 
strings, even though arrays may be strings or numeric.
   >
   > You can tell which type you have by checking the 
`INFORMATION_SCHEMA.COLUMNS` table, using a query like `SELECT COLUMN_NAME, 
DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'mytable'`. Arrays 
are type `ARRAY`, multi-value strings are type `VARCHAR`.
   
   I suggest including the same exact text in `multi-value-dimensions.md`, or 
at least linking to this section prominently.



##########
docs/querying/sql-data-types.md:
##########
@@ -75,6 +75,17 @@ Casts between two SQL types that have different Druid 
runtime types generate a r
 If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, 
Druid a substitutes [NULL](#null-values).
 When `druid.generic.useDefaultValueForNull = true` (legacy mode), Druid 
instead substitutes a default value, including when NULL values cast to 
non-nullable types. For example, if `druid.generic.useDefaultValueForNull = 
true`, a null VARCHAR cast to BIGINT is converted to a zero.
 
+## Arrays
+
+Druid supports [ARRAY types](arrays.md), which behave as standard SQL arrays, 
where results are grouped by matching entire arrays. The [`UNNEST` 
operator](./sql-array-functions.md#unn) can be used to perform operations on 
individual array elements, translating each element into a separate row. 
+
+ARRAY typed columns can be stored in segments with class JSON based ingestion 
using the 'auto' typed dimension schema shared with [schema 
auto-discovery](../ingestion/schema-design.md#schema-auto-discovery-for-dimensions)
 to detect and ingest arrays as ARRAY typed columns. For [SQL based 
ingestion](../multi-stage-query/index.md), the query context parameter 
`arrayIngestMode` must be specified as `"array"` to ingest ARRAY types. In 
Druid 28, the default mode for this parameter is `'mvd'` for backwards 
compatibility, which instead can only handle `ARRAY<STRING>` which it stores in 
[multi-value string columns](#multi-value-strings). 

Review Comment:
   No real reason to have the extra single quotes in `'mvd'`. Doing `mvd` is 
preferred.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"

Review Comment:
   "Arrays" is a better title and scope — as people can use arrays even if they 
don't have array columns. For example they can use `MV_TO_ARRAY`, `ARRAY_AGG`, 
etc.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:

Review Comment:
   Would be good to include an aggregation function here, since rollup examples 
tend to be clearer with one of them. Even `COUNT(*)` is illustrative.



##########
docs/multi-stage-query/reference.md:
##########
@@ -232,23 +232,25 @@ If you're using the web console, you can specify the 
context parameters through
 
 The following table lists the context parameters for the MSQ task engine:
 
-| Parameter | Description                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                          | Default value |
-|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
-| `maxNumTasks` | SELECT, INSERT, REPLACE<br /><br />The maximum total number 
of tasks to launch, including the controller task. The lowest possible value 
for this setting is 2: one controller and one worker. All tasks must be able to 
launch simultaneously. If they cannot, the query returns a `TaskStartTimeout` 
error code after approximately 10 minutes.<br /><br />May also be provided as 
`numTasks`. If both are present, `maxNumTasks` takes priority.                  
                                                                                
                                                                                
                                                                                
                                                                      | 2 |
-| `taskAssignment` | SELECT, INSERT, REPLACE<br /><br />Determines how many 
tasks to use. Possible values include: <ul><li>`max`: Uses as many tasks as 
possible, up to `maxNumTasks`.</li><li>`auto`: When file sizes can be 
determined through directory listing (for example: local files, S3, GCS, HDFS) 
uses as few tasks as possible without exceeding 512 MiB or 10,000 files per 
task, unless exceeding these limits is necessary to stay within `maxNumTasks`. 
When calculating the size of files, the weighted size is used, which considers 
the file format and compression format used if any. When file sizes cannot be 
determined through directory listing (for example: http), behaves the same as 
`max`.</li></ul>                                                                
             | `max` |
-| `finalizeAggregations` | SELECT, INSERT, REPLACE<br /><br />Determines the 
type of aggregation to return. If true, Druid finalizes the results of complex 
aggregations that directly appear in query results. If false, Druid returns the 
aggregation's intermediate type rather than finalized type. This parameter is 
useful during ingestion, where it enables storing sketches directly in Druid 
tables. For more information about aggregations, see [SQL aggregation 
functions](../querying/sql-aggregations.md).                                    
                                                                                
                                                                                
                                                                                
         | true |
-| `sqlJoinAlgorithm` | SELECT, INSERT, REPLACE<br /><br />Algorithm to use for 
JOIN. Use `broadcast` (the default) for broadcast hash join or `sortMerge` for 
sort-merge join. Affects all JOIN operations in the query. This is a hint to 
the MSQ engine and the actual joins in the query may proceed in a different way 
than specified. See [Joins](#joins) for more details.                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         | `broadcast` |
-| `rowsInMemory` | INSERT or REPLACE<br /><br />Maximum number of rows to 
store in memory at once before flushing to disk during the segment generation 
process. Ignored for non-INSERT queries. In most cases, use the default value. 
You may need to override the default if you run into one of the [known 
issues](./known-issues.md) around memory usage.                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
| 100,000 |
+| Parameter | Description | Default value |
+|---|---|---|
+| `maxNumTasks` | SELECT, INSERT, REPLACE<br /><br />The maximum total number 
of tasks to launch, including the controller task. The lowest possible value 
for this setting is 2: one controller and one worker. All tasks must be able to 
launch simultaneously. If they cannot, the query returns a `TaskStartTimeout` 
error code after approximately 10 minutes.<br /><br />May also be provided as 
`numTasks`. If both are present, `maxNumTasks` takes priority. | 2 |
+| `taskAssignment` | SELECT, INSERT, REPLACE<br /><br />Determines how many 
tasks to use. Possible values include: <ul><li>`max`: Uses as many tasks as 
possible, up to `maxNumTasks`.</li><li>`auto`: When file sizes can be 
determined through directory listing (for example: local files, S3, GCS, HDFS) 
uses as few tasks as possible without exceeding 512 MiB or 10,000 files per 
task, unless exceeding these limits is necessary to stay within `maxNumTasks`. 
When calculating the size of files, the weighted size is used, which considers 
the file format and compression format used if any. When file sizes cannot be 
determined through directory listing (for example: http), behaves the same as 
`max`.</li></ul> | `max` |
+| `finalizeAggregations` | SELECT, INSERT, REPLACE<br /><br />Determines the 
type of aggregation to return. If true, Druid finalizes the results of complex 
aggregations that directly appear in query results. If false, Druid returns the 
aggregation's intermediate type rather than finalized type. This parameter is 
useful during ingestion, where it enables storing sketches directly in Druid 
tables. For more information about aggregations, see [SQL aggregation 
functions](../querying/sql-aggregations.md). | true |
+| `sqlJoinAlgorithm` | SELECT, INSERT, REPLACE<br /><br />Algorithm to use for 
JOIN. Use `broadcast` (the default) for broadcast hash join or `sortMerge` for 
sort-merge join. Affects all JOIN operations in the query. This is a hint to 
the MSQ engine and the actual joins in the query may proceed in a different way 
than specified. See [Joins](#joins) for more details. | `broadcast` |
+| `rowsInMemory` | INSERT or REPLACE<br /><br />Maximum number of rows to 
store in memory at once before flushing to disk during the segment generation 
process. Ignored for non-INSERT queries. In most cases, use the default value. 
You may need to override the default if you run into one of the [known 
issues](./known-issues.md) around memory usage. | 100,000 |
 | `segmentSortOrder` | INSERT or REPLACE<br /><br />Normally, Druid sorts rows 
in individual segments using `__time` first, followed by the [CLUSTERED 
BY](#clustered-by) clause. When you set `segmentSortOrder`, Druid sorts rows in 
segments using this column list first, followed by the CLUSTERED BY order.<br 
/><br />You provide the column list as comma-separated values or as a JSON 
array in string form. If your query includes `__time`, then this list must 
begin with `__time`. For example, consider an INSERT query that uses `CLUSTERED 
BY country` and has `segmentSortOrder` set to `__time,city`. Within each time 
chunk, Druid assigns rows to segments based on `country`, and then within each 
of those segments, Druid sorts those rows by `__time` first, then `city`, then 
`country`. | empty list |
-| `maxParseExceptions`| SELECT, INSERT, REPLACE<br /><br />Maximum number of 
parse exceptions that are ignored while executing the query before it stops 
with `TooManyWarningsFault`. To ignore all the parse exceptions, set the value 
to -1.                                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                           | 0 |
-| `rowsPerSegment` | INSERT or REPLACE<br /><br />The number of rows per 
segment to target. The actual number of rows per segment may be somewhat higher 
or lower than this number. In most cases, use the default. For general 
information about sizing rows per segment, see [Segment Size 
Optimization](../operations/segment-optimization.md).                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                   | 3,000,000 |
-| `indexSpec` | INSERT or REPLACE<br /><br />An 
[`indexSpec`](../ingestion/ingestion-spec.md#indexspec) to use when generating 
segments. May be a JSON string or object. See [Front 
coding](../ingestion/ingestion-spec.md#front-coding) for details on configuring 
an `indexSpec` with front coding.                                               
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                       | See 
[`indexSpec`](../ingestion/ingestion-spec.md#indexspec). |
-| `durableShuffleStorage` | SELECT, INSERT, REPLACE <br /><br />Whether to use 
durable storage for shuffle mesh. To use this feature, configure the durable 
storage at the server level using 
`druid.msq.intermediate.storage.enable=true`). If these properties are not 
configured, any query with the context variable `durableShuffleStorage=true` 
fails with a configuration error. <br /><br />                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                 | `false` |
-| `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on 
fault tolerance mode or not. Failed workers are retried based on 
[Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly 
set to false.                                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
      | `false` |
-| `selectDestination` | SELECT<br /><br /> Controls where the final result of 
the select query is written. <br />Use `taskReport`(the default) to write 
select results to the task report. <b> This is not scalable since task reports 
size explodes for large results </b> <br/>Use `durableStorage` to write results 
to durable storage location. <b>For large results sets, its recommended to use 
`durableStorage` </b>. To configure durable storage see 
[`this`](#durable-storage) section.                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                             | `taskRep
 ort` |
-| `waitUntilSegmentsLoad` | INSERT, REPLACE<br /><br /> If set, the ingest 
query waits for the generated segment to be loaded before exiting, else the 
ingest query exits without waiting. The task and live reports contain the 
information about the status of loading segments if this flag is set. This will 
ensure that any future queries made after the ingestion exits will include 
results from the ingestion. The drawback is that the controller task will stall 
until the segments are loaded.                                                  
                                                                                
                                                                                
                                                                                
                                                                    | `false` |
-| `includeSegmentSource` | SELECT, INSERT, REPLACE<br /><br /> Controls the 
sources, which will be queried for results in addition to the segments present 
on deep storage. Can be `NONE` or `REALTIME`. If this value is `NONE`, only 
non-realtime (published and used) segments will be downloaded from deep 
storage. If this value is `REALTIME`, results will also be included from 
realtime tasks.                                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                             | `NONE` |
-| `rowsPerPage` | SELECT<br /><br />The number of rows per page to target. The 
actual number of rows per page may be somewhat higher or lower than this 
number. In most cases, use the default.<br /> This property comes into effect 
only when `selectDestination` is set to `durableStorage`                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
            | 100000 |
+| `maxParseExceptions`| SELECT, INSERT, REPLACE<br /><br />Maximum number of 
parse exceptions that are ignored while executing the query before it stops 
with `TooManyWarningsFault`. To ignore all the parse exceptions, set the value 
to -1. | 0 |
+| `rowsPerSegment` | INSERT or REPLACE<br /><br />The number of rows per 
segment to target. The actual number of rows per segment may be somewhat higher 
or lower than this number. In most cases, use the default. For general 
information about sizing rows per segment, see [Segment Size 
Optimization](../operations/segment-optimization.md). | 3,000,000 |
+| `indexSpec` | INSERT or REPLACE<br /><br />An 
[`indexSpec`](../ingestion/ingestion-spec.md#indexspec) to use when generating 
segments. May be a JSON string or object. See [Front 
coding](../ingestion/ingestion-spec.md#front-coding) for details on configuring 
an `indexSpec` with front coding. | See 
[`indexSpec`](../ingestion/ingestion-spec.md#indexspec). |
+| `durableShuffleStorage` | SELECT, INSERT, REPLACE <br /><br />Whether to use 
durable storage for shuffle mesh. To use this feature, configure the durable 
storage at the server level using 
`druid.msq.intermediate.storage.enable=true`). If these properties are not 
configured, any query with the context variable `durableShuffleStorage=true` 
fails with a configuration error. <br /><br /> | `false` |
+| `faultTolerance` | SELECT, INSERT, REPLACE<br /><br /> Whether to turn on 
fault tolerance mode or not. Failed workers are retried based on 
[Limits](#limits). Cannot be used when `durableShuffleStorage` is explicitly 
set to false. | `false` |
+| `selectDestination` | SELECT<br /><br /> Controls where the final result of 
the select query is written. <br />Use `taskReport`(the default) to write 
select results to the task report. <b> This is not scalable since task reports 
size explodes for large results </b> <br/>Use `durableStorage` to write results 
to durable storage location. <b>For large results sets, its recommended to use 
`durableStorage` </b>. To configure durable storage see 
[`this`](#durable-storage) section. | `taskReport` |
+| `waitUntilSegmentsLoad` | INSERT, REPLACE<br /><br /> If set, the ingest 
query waits for the generated segment to be loaded before exiting, else the 
ingest query exits without waiting. The task and live reports contain the 
information about the status of loading segments if this flag is set. This will 
ensure that any future queries made after the ingestion exits will include 
results from the ingestion. The drawback is that the controller task will stall 
till the segments are loaded. | `false` |
+| `includeSegmentSource` | SELECT, INSERT, REPLACE<br /><br /> Controls the 
sources, which will be queried for results in addition to the segments present 
on deep storage. Can be `NONE` or `REALTIME`. If this value is `NONE`, only 
non-realtime (published and used) segments will be downloaded from deep 
storage. If this value is `REALTIME`, results will also be included from 
realtime tasks. | `NONE` |
+| `rowsPerPage` | SELECT<br /><br />The number of rows per page to target. The 
actual number of rows per page may be somewhat higher or lower than this 
number. In most cases, use the default.<br /> This property comes into effect 
only when `selectDestination` is set to `durableStorage` | 100000 |
+| `arrayIngestMode` | INSERT, REPLACE<br /><br /> Controls how ARRAY type 
values are stored in Druid segments. When set to `'array'` (recommended for SQL 
compliance), Druid will store all ARRAY typed values in [ARRAY typed 
columns](../querying/arrays.md), and supports storing both VARCHAR and numeric 
typed arrays. When set to `'mvd'` (the default, for backwards compatibility), 
Druid only supports VARCHAR typed arrays, and will store them as [multi-value 
string columns](../querying/multi-value-dimensions.md). When set to `none`, 
Druid will throw an exception when trying to store any type of arrays, used to 
help migrate operators from `'mvd'` mode to `'array'` mode and force query 
writers to make an explicit choice between ARRAY and multi-value VARCHAR typed 
columns. | `'mvd'` (for backwards compatibility, recommended to use `array` for 
SQL compliance)|

Review Comment:
   These are sorted vaguely in order of how commonly people would need to 
change them (more common near the top). This one is probably going to be pretty 
common, since it's needed for correct SQL behavior with arrays. I suggest 
putting it after `finalizeAggregations`. Both of those are pretty big 
influences on the meaning of an insert statement.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).

Review Comment:
   Native batch ingestion, or streaming ingestion.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview

Review Comment:
   "Ingesting arrays" rather than "Overview"?
   
   In the ToC it will show up right above "Querying arrays" so there's a nice 
symmetry there



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS
+
+### Filtering
+
+All query types, as well as [filtered 
aggregators](aggregations.md#filtered-aggregator), can filter on array typed 
columns. Filters follow these rules for array types:
+
+- Value filters, like "equality", "range" match on entire array values
+- The "null" filter will match rows where the entire array value is null
+- Array specific functions like ARRAY_CONTAINS and ARRAY_OVERLAP follow the 
behavior specified by those functions
+- All other filters do not directly support ARRAY types
+
+#### Example: equality
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayLong = ARRAY[1,2,3]
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row3","arrayString":"[]","arrayLong":"[1,2,3]","arrayDouble":"[null,2.2,1.1]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+#### Example: null
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayLong is null
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row2","arrayString":"[null,\"b\"]","arrayLong":null,"arrayDouble":"[999.0,null,5.5]"}
+```
+
+#### Example: range
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayString >= ARRAY['a','b']
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+#### Example: ARRAY_CONTAINS
+```sql
+SELECT *
+FROM "array_example"
+WHERE ARRAY_CONTAINS(arrayString, 'a')
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+### Grouping
+
+When grouping on an array with SQL or a native [groupBy 
queries](groupbyquery.md), grouping follows standard SQL behavior and groups on 
the entire array as a single value. The [`UNNEST`](sql.md#unnest) function 
allows grouping on the individual array elements.
+
+#### Example: SQL grouping query with no filtering
+```sql
+SELECT label, arrayString
+FROM "array_example"
+GROUP BY 1,2
+```
+results in:
+```json lines
+{"label":"row1","arrayString":"[\"a\",\"b\"]"}
+{"label":"row2","arrayString":"[null,\"b\"]"}
+{"label":"row3","arrayString":"[]"}
+{"label":"row4","arrayString":"[\"a\",\"b\"]"}
+{"label":"row5","arrayString":null}
+```
+
+#### Example: SQL grouping query with a filter
+```sql
+SELECT label, arrayString
+FROM "array_example" CROSS JOIN UNNEST(arrayString) as u(strings)

Review Comment:
   is this `CROSS JOIN` meant to be here? it doesn't seem to be doing much if 
anything



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+

Review Comment:
   Use separate `###` sections for "Ingestion specs" and "SQL"



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS
+
+### Filtering
+
+All query types, as well as [filtered 
aggregators](aggregations.md#filtered-aggregator), can filter on array typed 
columns. Filters follow these rules for array types:
+
+- Value filters, like "equality", "range" match on entire array values
+- The "null" filter will match rows where the entire array value is null
+- Array specific functions like ARRAY_CONTAINS and ARRAY_OVERLAP follow the 
behavior specified by those functions
+- All other filters do not directly support ARRAY types

Review Comment:
   What does "do not directly support" mean? Error? Treated as null? Undefined 
behavior? Should be specific.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.

Review Comment:
   Odd to be mixing SQL terms with native type names here. Possibly switch to 
`VARCHAR`, `BIGINT`, and `DOUBLE` for making more sense with SQL. Or mention 
both the SQL name and the native name?



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS

Review Comment:
   "Querying arrays"?
   
   Don't see a reason to capitalize it here



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.

Review Comment:
   Sort of unclear what the verb "include" refers to. The sentence construction 
makes it sound like the arrays themselves must include the context parameter. 
But that isn't right. Also, "multi-stage ingestion" isn't a thing 🙂— it's 
"SQL-based ingestion" or "multi-stage query".
   
   So, suggestion:
   
   > Arrays can also be inserted with [SQL-based 
ingestion](../multi-stage-query/index.md) when you use the context parameter 
`"arrayIngestMode": "array"`.
   
   Also link the text `context parameter` to 
`docs/multi-stage-query/reference.md#context`.
   
   Also include some text about what will happen if you _don't_ do 
`arrayIngestMode: array`. Something like: string arrays will be converted to 
multi-value dimensions, and numeric arrays will cause the query to fail with an 
error (what error?)



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS
+
+### Filtering
+
+All query types, as well as [filtered 
aggregators](aggregations.md#filtered-aggregator), can filter on array typed 
columns. Filters follow these rules for array types:
+
+- Value filters, like "equality", "range" match on entire array values

Review Comment:
   Weird that the text here refers to native query concepts, but the examples 
are SQL. Seems like that should be better aligned. One way to do it is to 
reword things a bit here, like:
   
   > Native value filters like `equality` and `range` match on entire array 
values, as do SQL constructs that plan into these native filters.
   
   Also `equality` should link to `docs/querying/filters.md#equality-filter`, 
etc.



##########
docs/querying/sql-data-types.md:
##########
@@ -75,6 +75,17 @@ Casts between two SQL types that have different Druid 
runtime types generate a r
 If a value cannot be cast to the target type, as in `CAST('foo' AS BIGINT)`, 
Druid a substitutes [NULL](#null-values).
 When `druid.generic.useDefaultValueForNull = true` (legacy mode), Druid 
instead substitutes a default value, including when NULL values cast to 
non-nullable types. For example, if `druid.generic.useDefaultValueForNull = 
true`, a null VARCHAR cast to BIGINT is converted to a zero.
 
+## Arrays
+
+Druid supports [ARRAY types](arrays.md), which behave as standard SQL arrays, 
where results are grouped by matching entire arrays. The [`UNNEST` 
operator](./sql-array-functions.md#unn) can be used to perform operations on 
individual array elements, translating each element into a separate row. 
+
+ARRAY typed columns can be stored in segments with class JSON based ingestion 
using the 'auto' typed dimension schema shared with [schema 
auto-discovery](../ingestion/schema-design.md#schema-auto-discovery-for-dimensions)
 to detect and ingest arrays as ARRAY typed columns. For [SQL based 
ingestion](../multi-stage-query/index.md), the query context parameter 
`arrayIngestMode` must be specified as `"array"` to ingest ARRAY types. In 
Druid 28, the default mode for this parameter is `'mvd'` for backwards 
compatibility, which instead can only handle `ARRAY<STRING>` which it stores in 
[multi-value string columns](#multi-value-strings). 
+
+You can convert multi-value dimensions to standard SQL arrays explicitly with 
`MV_TO_ARRAY` or implicitly using [array functions](./sql-array-functions.md). 
You can also use the array functions to construct arrays from multiple columns.
+
+Druid serializes `ARRAY` results as a JSON string of the array by default, 
which can be controlled by the context parameter
+`sqlStringifyArrays`. When set to `false`, the arrays will instead be returned 
as regular JSON arrays instead of in stringified form.

Review Comment:
   Surely this is only true for certain result formats? I mean, in `csv`, 
everything must be stringified somehow.



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS
+
+### Filtering
+
+All query types, as well as [filtered 
aggregators](aggregations.md#filtered-aggregator), can filter on array typed 
columns. Filters follow these rules for array types:
+
+- Value filters, like "equality", "range" match on entire array values
+- The "null" filter will match rows where the entire array value is null
+- Array specific functions like ARRAY_CONTAINS and ARRAY_OVERLAP follow the 
behavior specified by those functions
+- All other filters do not directly support ARRAY types
+
+#### Example: equality
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayLong = ARRAY[1,2,3]
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row3","arrayString":"[]","arrayLong":"[1,2,3]","arrayDouble":"[null,2.2,1.1]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+#### Example: null
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayLong is null
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row2","arrayString":"[null,\"b\"]","arrayLong":null,"arrayDouble":"[999.0,null,5.5]"}
+```
+
+#### Example: range
+```sql
+SELECT *
+FROM "array_example"
+WHERE arrayString >= ARRAY['a','b']
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+#### Example: ARRAY_CONTAINS
+```sql
+SELECT *
+FROM "array_example"
+WHERE ARRAY_CONTAINS(arrayString, 'a')
+```
+
+```json lines
+{"__time":"2023-01-01T00:00:00.000Z","label":"row1","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,null,3]","arrayDouble":"[1.1,2.2,null]"}
+{"__time":"2023-01-01T00:00:00.000Z","label":"row4","arrayString":"[\"a\",\"b\"]","arrayLong":"[1,2,3]","arrayDouble":"[]"}
+```
+
+### Grouping
+
+When grouping on an array with SQL or a native [groupBy 
queries](groupbyquery.md), grouping follows standard SQL behavior and groups on 
the entire array as a single value. The [`UNNEST`](sql.md#unnest) function 
allows grouping on the individual array elements.

Review Comment:
   grammar: "groupBy query"



##########
docs/querying/arrays.md:
##########
@@ -0,0 +1,228 @@
+---
+id: arrays
+title: "Array columns"
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+
+Apache Druid supports SQL standard `ARRAY` typed columns for `STRING`, `LONG`, 
and `DOUBLE` types. Other more complicated ARRAY types must be stored in 
[nested columns](nested-columns.md). Druid ARRAY types are distinct from 
[multi-value dimension](multi-value-dimensions.md), which have significantly 
different behavior than standard arrays.
+
+This document describes inserting, filtering, and grouping behavior for 
`ARRAY` typed columns.
+Refer to the [Druid SQL data type documentation](sql-data-types.md#arrays) and 
[SQL array function reference](sql-array-functions.md) for additional details
+about the functions available to use with ARRAY columns and types in SQL.
+
+The following sections describe inserting, filtering, and grouping behavior 
based on the following example data, which includes 3 array typed columns.
+
+```json lines
+{"timestamp": "2023-01-01T00:00:00", "label": "row1", "arrayString": ["a", 
"b"],  "arrayLong":[1, null,3], "arrayDouble":[1.1, 2.2, null]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row2", "arrayString": [null, 
"b"], "arrayLong":null,        "arrayDouble":[999, null, 5.5]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row3", "arrayString": [],       
   "arrayLong":[1, 2, 3],   "arrayDouble":[null, 2.2, 1.1]} 
+{"timestamp": "2023-01-01T00:00:00", "label": "row4", "arrayString": ["a", 
"b"],  "arrayLong":[1, 2, 3],   "arrayDouble":[]}
+{"timestamp": "2023-01-01T00:00:00", "label": "row5", "arrayString": null,     
   "arrayLong":[],          "arrayDouble":null}
+```
+
+## Overview
+
+When using [native ingestion](../ingestion/native-batch.md), arrays can be 
ingested using the [`"auto"`](../ingestion/ingestion-spec.md#dimension-objects) 
type dimension schema which is shared with [type-aware schema 
discovery](../ingestion/schema-design.md#type-aware-schema-discovery).
+
+When ingesting from TSV or CSV data, you can specify the array delimiters 
using the `listDelimiter` field in the `inputFormat`. JSON data must be 
formatted as a JSON array to be ingested as an array type. JSON data does not 
require `inputFormat` configuration.
+
+The following shows an example `dimensionsSpec` for native ingestion of the 
data used in this document:
+
+```
+"dimensions": [
+  {
+    "type": "auto",
+    "name": "label"
+  },
+  {
+    "type": "auto",
+    "name": "arrayString"
+  },
+  {
+    "type": "auto",
+    "name": "arrayLong"
+  },
+  {
+    "type": "auto",
+    "name": "arrayDouble"
+  }
+],
+```
+
+Arrays can also be inserted with [multi-stage 
ingestion](../multi-stage-query/index.md), but must include a query context 
parameter `"arrayIngestMode":"array"`.
+
+For example, to insert the data used in this document:
+```sql
+REPLACE INTO "array_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+PARTITIONED BY DAY
+```
+
+These input arrays can also be grouped for rollup:
+
+```sql
+REPLACE INTO "array_example_rollup" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"],  
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null,        
\"arrayLong\":[],          \"arrayDouble\":null}"}',
+      '{"type":"json"}',
+      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+    )
+  )
+)
+SELECT
+  TIME_PARSE("timestamp") AS "__time",
+  "label",
+  "arrayString",
+  "arrayLong",
+  "arrayDouble"
+FROM "ext"
+GROUP BY 1,2,3,4,5
+PARTITIONED BY DAY
+```
+
+
+## Querying ARRAYS
+
+### Filtering
+
+All query types, as well as [filtered 
aggregators](aggregations.md#filtered-aggregator), can filter on array typed 
columns. Filters follow these rules for array types:
+
+- Value filters, like "equality", "range" match on entire array values
+- The "null" filter will match rows where the entire array value is null
+- Array specific functions like ARRAY_CONTAINS and ARRAY_OVERLAP follow the 
behavior specified by those functions

Review Comment:
   backticks around function names, & link them to the SQL function docs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] better documentation for the differences between arrays and mvds (druid)

Reply via email to