(druid) branch 29.0.1 updated: [Backport] MSQ: Validate that strings and string arrays are not mixed. (#15920) (#16160)

karan Tue, 19 Mar 2024 07:13:51 -0700

This is an automated email from the ASF dual-hosted git repository.

karan pushed a commit to branch 29.0.1
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/29.0.1 by this push:
     new b1a8243cf39 [Backport] MSQ: Validate that strings and string arrays 
are not mixed. (#15920) (#16160)
b1a8243cf39 is described below

commit b1a8243cf390945d575d7a3acc0fb217af91c729
Author: Karan Kumar <[email protected]>
AuthorDate: Tue Mar 19 19:41:54 2024 +0530

    [Backport] MSQ: Validate that strings and string arrays are not mixed. 
(#15920) (#16160)
    
    * Cherry-picking 15920-to-29.0.1
    
    * Fixing extra test case which got added as part of merge
    
    ---------
    
    Co-authored-by: Gian Merlino <[email protected]>
---
 docs/multi-stage-query/concepts.md                 |   4 +-
 docs/multi-stage-query/reference.md                |   3 +-
 docs/querying/arrays.md                            |  62 ++++-
 docs/querying/multi-value-dimensions.md            |   4 +-
 .../org/apache/druid/msq/exec/ControllerImpl.java  |  10 +-
 .../org/apache/druid/msq/sql/MSQTaskSqlEngine.java | 274 ++++++++++++++++-----
 .../druid/msq/util/DimensionSchemaUtils.java       | 130 ++++++----
 .../druid/msq/util/MultiStageQueryContext.java     |  29 ++-
 .../org/apache/druid/msq/exec/MSQArraysTest.java   | 187 +++++++++++++-
 .../org/apache/druid/msq/exec/MSQInsertTest.java   |   2 +-
 .../druid/msq/test/CalciteMSQTestsHelper.java      |   4 +-
 .../druid/msq/util/DimensionSchemaUtilsTest.java   |   6 +-
 .../druid/msq/util/MultiStageQueryContextTest.java |   2 +-
 .../druid/sql/avatica/DruidAvaticaHandlerTest.java |  12 +
 .../druid/sql/calcite/CalciteArraysQueryTest.java  | 210 +++-------------
 .../apache/druid/sql/calcite/CalciteQueryTest.java |   2 +
 .../druid/sql/calcite/util/CalciteTests.java       |   1 +
 .../druid/sql/calcite/util/TestDataBuilder.java    |  34 +++
 18 files changed, 650 insertions(+), 326 deletions(-)

diff --git a/docs/multi-stage-query/concepts.md 
b/docs/multi-stage-query/concepts.md
index 27b7d12c91c..cae88a0f375 100644
--- a/docs/multi-stage-query/concepts.md
+++ b/docs/multi-stage-query/concepts.md
@@ -200,8 +200,8 @@ To perform ingestion with rollup:
 2. Set [`finalizeAggregations: false`](reference.md#context-parameters) in 
your context. This causes aggregation
    functions to write their internal state to the generated segments, instead 
of the finalized end result, and enables
    further aggregation at query time.
-3. See [ARRAY types](../querying/arrays.md#sql-based-ingestion-with-rollup) 
for information about ingesting `ARRAY` columns
-4. See [multi-value 
dimensions](../querying/multi-value-dimensions.md#sql-based-ingestion-with-rollup)
 for information to ingest multi-value VARCHAR columns
+3. See [ARRAY types](../querying/arrays.md#sql-based-ingestion) for 
information about ingesting `ARRAY` columns
+4. See [multi-value 
dimensions](../querying/multi-value-dimensions.md#sql-based-ingestion) for 
information to ingest multi-value VARCHAR columns
 
 When you do all of these things, Druid understands that you intend to do an 
ingestion with rollup, and it writes
 rollup-related metadata into the generated segments. Other applications can 
then use [`segmentMetadata`
diff --git a/docs/multi-stage-query/reference.md 
b/docs/multi-stage-query/reference.md
index 25f55b31f74..a56fae0ab2f 100644
--- a/docs/multi-stage-query/reference.md
+++ b/docs/multi-stage-query/reference.md
@@ -325,7 +325,7 @@ The following table lists the context parameters for the 
MSQ task engine:
 | `maxNumTasks` | SELECT, INSERT, REPLACE<br /><br />The maximum total number 
of tasks to launch, including the controller task. The lowest possible value 
for this setting is 2: one controller and one worker. All tasks must be able to 
launch simultaneously. If they cannot, the query returns a `TaskStartTimeout` 
error code after approximately 10 minutes.<br /><br />May also be provided as 
`numTasks`. If both are present, `maxNumTasks` takes priority. | 2 |
 | `taskAssignment` | SELECT, INSERT, REPLACE<br /><br />Determines how many 
tasks to use. Possible values include: <ul><li>`max`: Uses as many tasks as 
possible, up to `maxNumTasks`.</li><li>`auto`: When file sizes can be 
determined through directory listing (for example: local files, S3, GCS, HDFS) 
uses as few tasks as possible without exceeding 512 MiB or 10,000 files per 
task, unless exceeding these limits is necessary to stay within `maxNumTasks`. 
When calculating the size of files,  [...]
 | `finalizeAggregations` | SELECT, INSERT, REPLACE<br /><br />Determines the 
type of aggregation to return. If true, Druid finalizes the results of complex 
aggregations that directly appear in query results. If false, Druid returns the 
aggregation's intermediate type rather than finalized type. This parameter is 
useful during ingestion, where it enables storing sketches directly in Druid 
tables. For more information about aggregations, see [SQL aggregation 
functions](../querying/sql-aggr [...]
-| `arrayIngestMode` | INSERT, REPLACE<br /><br /> Controls how ARRAY type 
values are stored in Druid segments. When set to `array` (recommended for SQL 
compliance), Druid will store all ARRAY typed values in [ARRAY typed 
columns](../querying/arrays.md), and supports storing both VARCHAR and numeric 
typed arrays. When set to `mvd` (the default, for backwards compatibility), 
Druid only supports VARCHAR typed arrays, and will store them as [multi-value 
string columns](../querying/multi-valu [...]
+| `arrayIngestMode` | INSERT, REPLACE<br /><br /> Controls how ARRAY type 
values are stored in Druid segments. When set to `array` (recommended for SQL 
compliance), Druid will store all ARRAY typed values in [ARRAY typed 
columns](../querying/arrays.md), and supports storing both VARCHAR and numeric 
typed arrays. When set to `mvd` (the default, for backwards compatibility), 
Druid only supports VARCHAR typed arrays, and will store them as [multi-value 
string columns](../querying/multi-valu [...]
 | `sqlJoinAlgorithm` | SELECT, INSERT, REPLACE<br /><br />Algorithm to use for 
JOIN. Use `broadcast` (the default) for broadcast hash join or `sortMerge` for 
sort-merge join. Affects all JOIN operations in the query. This is a hint to 
the MSQ engine and the actual joins in the query may proceed in a different way 
than specified. See [Joins](#joins) for more details. | `broadcast` |
 | `rowsInMemory` | INSERT or REPLACE<br /><br />Maximum number of rows to 
store in memory at once before flushing to disk during the segment generation 
process. Ignored for non-INSERT queries. In most cases, use the default value. 
You may need to override the default if you run into one of the [known 
issues](./known-issues.md) around memory usage. | 100,000 |
 | `segmentSortOrder` | INSERT or REPLACE<br /><br />Normally, Druid sorts rows 
in individual segments using `__time` first, followed by the [CLUSTERED 
BY](#clustered-by) clause. When you set `segmentSortOrder`, Druid sorts rows in 
segments using this column list first, followed by the CLUSTERED BY order.<br 
/><br />You provide the column list as comma-separated values or as a JSON 
array in string form. If your query includes `__time`, then this list must 
begin with `__time`. For example, [...]
@@ -338,6 +338,7 @@ The following table lists the context parameters for the 
MSQ task engine:
 | `waitUntilSegmentsLoad` | INSERT, REPLACE<br /><br /> If set, the ingest 
query waits for the generated segment to be loaded before exiting, else the 
ingest query exits without waiting. The task and live reports contain the 
information about the status of loading segments if this flag is set. This will 
ensure that any future queries made after the ingestion exits will include 
results from the ingestion. The drawback is that the controller task will stall 
till the segments are loaded. |  [...]
 | `includeSegmentSource` | SELECT, INSERT, REPLACE<br /><br /> Controls the 
sources, which will be queried for results in addition to the segments present 
on deep storage. Can be `NONE` or `REALTIME`. If this value is `NONE`, only 
non-realtime (published and used) segments will be downloaded from deep 
storage. If this value is `REALTIME`, results will also be included from 
realtime tasks. | `NONE` |
 | `rowsPerPage` | SELECT<br /><br />The number of rows per page to target. The 
actual number of rows per page may be somewhat higher or lower than this 
number. In most cases, use the default.<br /> This property comes into effect 
only when `selectDestination` is set to `durableStorage` | 100000 |
+| `skipTypeVerification` | INSERT or REPLACE<br /><br />During query 
validation, Druid validates that [string arrays](../querying/arrays.md) and 
[multi-value dimensions](../querying/multi-value-dimensions.md) are not mixed 
in the same column. If you are intentionally migrating from one to the other, 
use this context parameter to disable type validation.<br /><br />Provide the 
column list as comma-separated values or as a JSON array in string form.| empty 
list |
 | `failOnEmptyInsert` | INSERT or REPLACE<br /><br /> When set to false (the 
default), an INSERT query generating no output rows will be no-op, and a 
REPLACE query generating no output rows will delete all data that matches the 
OVERWRITE clause.  When set to true, an ingest query generating no output rows 
will throw an `InsertCannotBeEmpty` fault. | `false` |
 
 ## Joins
diff --git a/docs/querying/arrays.md b/docs/querying/arrays.md
index dbeb3ec6e02..a7eebaa32af 100644
--- a/docs/querying/arrays.md
+++ b/docs/querying/arrays.md
@@ -71,9 +71,46 @@ The following shows an example `dimensionsSpec` for native 
ingestion of the data
 
 ### SQL-based ingestion
 
-Arrays can also be inserted with [SQL-based 
ingestion](../multi-stage-query/index.md) when you include a query context 
parameter 
[`"arrayIngestMode":"array"`](../multi-stage-query/reference.md#context-parameters).
+#### `arrayIngestMode`
+
+Arrays can be inserted with [SQL-based 
ingestion](../multi-stage-query/index.md) when you include the query context
+parameter `arrayIngestMode: array`.
+
+When `arrayIngestMode` is `array`, SQL ARRAY types are stored using Druid 
array columns. This is recommended for new
+tables.
+
+When `arrayIngestMode` is `mvd`, SQL `VARCHAR ARRAY` are implicitly wrapped in 
[`ARRAY_TO_MV`](sql-functions.md#array_to_mv).
+This causes them to be stored as [multi-value 
strings](multi-value-dimensions.md), using the same `STRING` column type
+as regular scalar strings. SQL `BIGINT ARRAY` and `DOUBLE ARRAY` cannot be 
loaded under `arrayIngestMode: mvd`. This
+is the default behavior when `arrayIngestMode` is not provided in your query 
context, although the default behavior
+may change to `array` in a future release.
+
+When `arrayIngestMode` is `none`, Druid throws an exception when trying to 
store any type of arrays. This mode is most
+useful when set in the system default query context with 
`druid.query.default.context.arrayIngestMode = none`, in cases
+where the cluster administrator wants SQL query authors to explicitly provide 
one or the other in their query context.
+
+The following table summarizes the differences in SQL ARRAY handling between 
`arrayIngestMode: array` and
+`arrayIngestMode: mvd`.
+
+| SQL type | Stored type when `arrayIngestMode: array` | Stored type when 
`arrayIngestMode: mvd` (default) |
+|---|---|---|
+|`VARCHAR ARRAY`|`ARRAY<STRING>`|[multi-value 
`STRING`](multi-value-dimensions.md)|
+|`BIGINT ARRAY`|`ARRAY<LONG>`|not possible (validation error)|
+|`DOUBLE ARRAY`|`ARRAY<DOUBLE>`|not possible (validation error)|
+
+In either mode, you can explicitly wrap string arrays in `ARRAY_TO_MV` to 
cause them to be stored as
+[multi-value strings](multi-value-dimensions.md).
+
+When validating a SQL INSERT or REPLACE statement that contains arrays, Druid 
checks whether the statement would lead
+to mixing string arrays and multi-value strings in the same column. If this 
condition is detected, the statement fails
+validation unless the column is named under the `skipTypeVerification` context 
parameter. This parameter can be either
+a comma-separated list of column names, or a JSON array in string form. This 
validation is done to prevent accidentally
+mixing arrays and multi-value strings in the same column.
+
+#### Examples
+
+Set [`arrayIngestMode: array`](#arrayingestmode) in your query context to run 
the following examples.
 
-For example, to insert the data used in this document:
 ```sql
 REPLACE INTO "array_example" OVERWRITE ALL
 WITH "ext" AS (
@@ -81,9 +118,14 @@ WITH "ext" AS (
   FROM TABLE(
     EXTERN(
       '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, [...]
-      '{"type":"json"}',
-      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+      '{"type":"json"}'
     )
+  ) EXTEND (
+    "timestamp" VARCHAR,
+    "label" VARCHAR,
+    "arrayString" VARCHAR ARRAY,
+    "arrayLong" BIGINT ARRAY,
+    "arrayDouble" DOUBLE ARRAY
   )
 )
 SELECT
@@ -96,8 +138,7 @@ FROM "ext"
 PARTITIONED BY DAY
 ```
 
-### SQL-based ingestion with rollup
-These input arrays can also be grouped for rollup:
+Arrays can also be used as `GROUP BY` keys for rollup:
 
 ```sql
 REPLACE INTO "array_example_rollup" OVERWRITE ALL
@@ -106,9 +147,14 @@ WITH "ext" AS (
   FROM TABLE(
     EXTERN(
       '{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", 
\"label\": \"row1\", \"arrayString\": [\"a\", \"b\"],  \"arrayLong\":[1, 
null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], 
\"arrayLong\":null,        \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": 
\"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [],          
\"arrayLong\":[1, 2, 3],   \"arrayDouble\":[null, 2.2, [...]
-      '{"type":"json"}',
-      '[{"name":"timestamp", "type":"STRING"},{"name":"label", 
"type":"STRING"},{"name":"arrayString", 
"type":"ARRAY<STRING>"},{"name":"arrayLong", 
"type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]'
+      '{"type":"json"}'
     )
+  ) EXTEND (
+    "timestamp" VARCHAR,
+    "label" VARCHAR,
+    "arrayString" VARCHAR ARRAY,
+    "arrayLong" BIGINT ARRAY,
+    "arrayDouble" DOUBLE ARRAY
   )
 )
 SELECT
diff --git a/docs/querying/multi-value-dimensions.md 
b/docs/querying/multi-value-dimensions.md
index 2b33737a36f..1ce3a618dac 100644
--- a/docs/querying/multi-value-dimensions.md
+++ b/docs/querying/multi-value-dimensions.md
@@ -507,9 +507,9 @@ Avoid confusing string arrays with [multi-value 
dimensions](multi-value-dimensio
 
 Use care during ingestion to ensure you get the type you want.
 
-To get arrays when performing an ingestion using JSON ingestion specs, such as 
[native batch](../ingestion/native-batch.md) or streaming ingestion such as 
with [Apache Kafka](../ingestion/kafka-ingestion.md), use dimension type `auto` 
or enable `useSchemaDiscovery`. When performing a [SQL-based 
ingestion](../multi-stage-query/index.md), write a query that generates arrays 
and set the context parameter `"arrayIngestMode": "array"`. Arrays may contain 
strings or numbers.
+To get arrays when performing an ingestion using JSON ingestion specs, such as 
[native batch](../ingestion/native-batch.md) or streaming ingestion such as 
with [Apache Kafka](../ingestion/kafka-ingestion.md), use dimension type `auto` 
or enable `useSchemaDiscovery`. When performing a [SQL-based 
ingestion](../multi-stage-query/index.md), write a query that generates arrays 
and set the context parameter [`"arrayIngestMode": 
"array"`](arrays.md#arrayingestmode). Arrays may contain strings o [...]
 
-To get multi-value dimensions when performing an ingestion using JSON 
ingestion specs, use dimension type `string` and do not enable 
`useSchemaDiscovery`. When performing a [SQL-based 
ingestion](../multi-stage-query/index.md), wrap arrays in 
[`ARRAY_TO_MV`](multi-value-dimensions.md#sql-based-ingestion), which ensures 
you get multi-value dimensions in any `arrayIngestMode`. Multi-value dimensions 
can only contain strings.
+To get multi-value dimensions when performing an ingestion using JSON 
ingestion specs, use dimension type `string` and do not enable 
`useSchemaDiscovery`. When performing a [SQL-based 
ingestion](../multi-stage-query/index.md), wrap arrays in 
[`ARRAY_TO_MV`](multi-value-dimensions.md#sql-based-ingestion), which ensures 
you get multi-value dimensions in any 
[`arrayIngestMode`](arrays.md#arrayingestmode). Multi-value dimensions can only 
contain strings.
 
 You can tell which type you have by checking the `INFORMATION_SCHEMA.COLUMNS` 
table, using a query like:
 
diff --git 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
index bb3c4369527..d926fcd0515 100644
--- 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
+++ 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
@@ -2136,9 +2136,13 @@ public class ControllerImpl implements Controller
     // deprecation and removal in future
     if (MultiStageQueryContext.getArrayIngestMode(query.context()) == 
ArrayIngestMode.MVD) {
       log.warn(
-          "'%s' is set to 'mvd' in the query's context. This ingests the 
string arrays as multi-value "
-          + "strings instead of arrays, and is preserved for legacy reasons 
when MVDs were the only way to ingest string "
-          + "arrays in Druid. It is incorrect behaviour and will likely be 
removed in the future releases of Druid",
+          "%s[mvd] is active for this task. This causes string arrays (VARCHAR 
ARRAY in SQL) to be ingested as "
+          + "multi-value strings rather than true arrays. This behavior may 
change in a future version of Druid. To be "
+          + "compatible with future behavior changes, we recommend setting %s 
to[array], which creates a clearer "
+          + "separation between multi-value strings and true arrays. In 
either[mvd] or[array] mode, you can write "
+          + "out multi-value string dimensions using ARRAY_TO_MV. "
+          + "See 
https://druid.apache.org/docs/latest/querying/arrays#arrayingestmode for more 
details.",
+          MultiStageQueryContext.CTX_ARRAY_INGEST_MODE,
           MultiStageQueryContext.CTX_ARRAY_INGEST_MODE
       );
     }
diff --git 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java
 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java
index 6f4f109ffa4..8fbb52df273 100644
--- 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java
+++ 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java
@@ -29,20 +29,29 @@ import org.apache.calcite.rel.core.Project;
 import org.apache.calcite.rel.core.Sort;
 import org.apache.calcite.rel.type.RelDataType;
 import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.rel.type.RelDataTypeField;
+import org.apache.calcite.schema.Table;
+import org.apache.calcite.sql.dialect.CalciteSqlDialect;
 import org.apache.calcite.sql.type.SqlTypeName;
 import org.apache.calcite.util.Pair;
 import org.apache.druid.error.DruidException;
 import org.apache.druid.error.InvalidInput;
 import org.apache.druid.error.InvalidSqlInput;
+import org.apache.druid.java.util.common.StringUtils;
 import org.apache.druid.java.util.common.granularity.Granularities;
 import org.apache.druid.java.util.common.granularity.Granularity;
 import org.apache.druid.msq.querykit.QueryKitUtils;
+import org.apache.druid.msq.util.ArrayIngestMode;
+import org.apache.druid.msq.util.DimensionSchemaUtils;
 import org.apache.druid.msq.util.MultiStageQueryContext;
 import org.apache.druid.rpc.indexing.OverlordClient;
 import org.apache.druid.segment.column.ColumnHolder;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
 import org.apache.druid.sql.calcite.parser.DruidSqlIngest;
 import org.apache.druid.sql.calcite.parser.DruidSqlInsert;
 import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.DruidTypeSystem;
 import org.apache.druid.sql.calcite.planner.PlannerContext;
 import org.apache.druid.sql.calcite.run.EngineFeature;
 import org.apache.druid.sql.calcite.run.NativeSqlEngine;
@@ -50,7 +59,9 @@ import org.apache.druid.sql.calcite.run.QueryMaker;
 import org.apache.druid.sql.calcite.run.SqlEngine;
 import org.apache.druid.sql.calcite.run.SqlEngines;
 import org.apache.druid.sql.destination.IngestDestination;
+import org.apache.druid.sql.destination.TableDestination;
 
+import javax.annotation.Nullable;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
@@ -162,7 +173,18 @@ public class MSQTaskSqlEngine implements SqlEngine
       final PlannerContext plannerContext
   )
   {
-    validateInsert(relRoot.rel, relRoot.fields, plannerContext);
+    validateInsert(
+        relRoot.rel,
+        relRoot.fields,
+        destination instanceof TableDestination
+        ? plannerContext.getPlannerToolbox()
+                        .rootSchema()
+                        
.getNamedSchema(plannerContext.getPlannerToolbox().druidSchemaName())
+                        .getSchema()
+                        .getTable(((TableDestination) 
destination).getTableName())
+        : null,
+        plannerContext
+    );
 
     return new MSQTaskQueryMaker(
         destination,
@@ -193,65 +215,78 @@ public class MSQTaskSqlEngine implements SqlEngine
     }
   }
 
+  /**
+   * Engine-specific validation that happens after the query is planned.
+   */
   private static void validateInsert(
       final RelNode rootRel,
       final List<Pair<Integer, String>> fieldMappings,
+      @Nullable Table targetTable,
       final PlannerContext plannerContext
   )
   {
+    final int timeColumnIndex = getTimeColumnIndex(fieldMappings);
+    final Granularity segmentGranularity = 
getSegmentGranularity(plannerContext);
+
     validateNoDuplicateAliases(fieldMappings);
+    validateTimeColumnType(rootRel, timeColumnIndex);
+    validateTimeColumnExistsIfNeeded(timeColumnIndex, segmentGranularity);
+    validateLimitAndOffset(rootRel, 
Granularities.ALL.equals(segmentGranularity));
+    validateTypeChanges(rootRel, fieldMappings, targetTable, plannerContext);
+  }
 
-    // Find the __time field.
-    int timeFieldIndex = -1;
+  /**
+   * SQL allows multiple output columns with the same name. However, we don't 
allow this for INSERT or REPLACE
+   * queries, because we use these output names to generate columns in 
segments. They must be unique.
+   */
+  private static void validateNoDuplicateAliases(final List<Pair<Integer, 
String>> fieldMappings)
+  {
+    final Set<String> aliasesSeen = new HashSet<>();
 
     for (final Pair<Integer, String> field : fieldMappings) {
-      if (field.right.equals(ColumnHolder.TIME_COLUMN_NAME)) {
-        timeFieldIndex = field.left;
-
-        // Validate the __time field has the proper type.
-        final SqlTypeName timeType = 
rootRel.getRowType().getFieldList().get(field.left).getType().getSqlTypeName();
-        if (timeType != SqlTypeName.TIMESTAMP) {
-          throw InvalidSqlInput.exception(
-              "Field [%s] was the wrong type [%s], expected TIMESTAMP",
-              ColumnHolder.TIME_COLUMN_NAME,
-              timeType
-          );
-        }
+      if (!aliasesSeen.add(field.right)) {
+        throw InvalidSqlInput.exception("Duplicate field in SELECT: [%s]", 
field.right);
       }
     }
+  }
 
-    // Validate that if segmentGranularity is not ALL then there is also a 
__time field.
-    final Granularity segmentGranularity;
-
-    try {
-      segmentGranularity = QueryKitUtils.getSegmentGranularityFromContext(
-          plannerContext.getJsonMapper(),
-          plannerContext.queryContextMap()
-      );
+  /**
+   * Validate the time field {@link ColumnHolder#TIME_COLUMN_NAME} has type 
TIMESTAMP.
+   *
+   * @param rootRel         root rel
+   * @param timeColumnIndex index of the time field
+   */
+  private static void validateTimeColumnType(final RelNode rootRel, final int 
timeColumnIndex)
+  {
+    if (timeColumnIndex < 0) {
+      return;
     }
-    catch (Exception e) {
-      // This is a defensive check as the 
DruidSqlInsert.SQL_INSERT_SEGMENT_GRANULARITY in the query context is
-      // populated by Druid. If the user entered an incorrect granularity, 
that should have been flagged before reaching
-      // here
-      throw DruidException.forPersona(DruidException.Persona.DEVELOPER)
-                          .ofCategory(DruidException.Category.DEFENSIVE)
-                          .build(
-                              e,
-                              "[%s] is not a valid value for [%s]",
-                              
plannerContext.queryContext().get(DruidSqlInsert.SQL_INSERT_SEGMENT_GRANULARITY),
-                              DruidSqlInsert.SQL_INSERT_SEGMENT_GRANULARITY
-                          );
 
+    // Validate the __time field has the proper type.
+    final SqlTypeName timeType = 
rootRel.getRowType().getFieldList().get(timeColumnIndex).getType().getSqlTypeName();
+    if (timeType != SqlTypeName.TIMESTAMP) {
+      throw InvalidSqlInput.exception(
+          "Field[%s] was the wrong type[%s], expected TIMESTAMP",
+          ColumnHolder.TIME_COLUMN_NAME,
+          timeType
+      );
     }
+  }
 
+  /**
+   * Validate that if segmentGranularity is not ALL, then there is also a 
{@link ColumnHolder#TIME_COLUMN_NAME} field.
+   *
+   * @param segmentGranularity granularity from {@link 
#getSegmentGranularity(PlannerContext)}
+   * @param timeColumnIndex    index of the time field
+   */
+  private static void validateTimeColumnExistsIfNeeded(
+      final int timeColumnIndex,
+      final Granularity segmentGranularity
+  )
+  {
     final boolean hasSegmentGranularity = 
!Granularities.ALL.equals(segmentGranularity);
 
-    // Validate that the query does not have an inappropriate LIMIT or OFFSET. 
LIMIT prevents gathering result key
-    // statistics, which INSERT execution logic depends on. (In QueryKit, 
LIMIT disables statistics generation and
-    // funnels everything through a single partition.)
-    validateLimitAndOffset(rootRel, !hasSegmentGranularity);
-
-    if (hasSegmentGranularity && timeFieldIndex < 0) {
+    if (hasSegmentGranularity && timeColumnIndex < 0) {
       throw InvalidInput.exception(
           "The granularity [%s] specified in the PARTITIONED BY clause of the 
INSERT query is different from ALL. "
           + "Therefore, the query must specify a time column (named __time).",
@@ -261,29 +296,24 @@ public class MSQTaskSqlEngine implements SqlEngine
   }
 
   /**
-   * SQL allows multiple output columns with the same name. However, we don't 
allow this for INSERT or REPLACE
-   * queries, because we use these output names to generate columns in 
segments. They must be unique.
+   * Validate that the query does not have an inappropriate LIMIT or OFFSET. 
LIMIT prevents gathering result key
+   * statistics, which INSERT execution logic depends on. (In QueryKit, LIMIT 
disables statistics generation and
+   * funnels everything through a single partition.)
+   *
+   * LIMIT is allowed when segment granularity is ALL, disallowed otherwise. 
OFFSET is never allowed.
+   *
+   * @param rootRel root rel
+   * @param limitOk whether LIMIT is ok (OFFSET is never ok)
    */
-  private static void validateNoDuplicateAliases(final List<Pair<Integer, 
String>> fieldMappings)
-  {
-    final Set<String> aliasesSeen = new HashSet<>();
-
-    for (final Pair<Integer, String> field : fieldMappings) {
-      if (!aliasesSeen.add(field.right)) {
-        throw InvalidSqlInput.exception("Duplicate field in SELECT: [%s]", 
field.right);
-      }
-    }
-  }
-
-  private static void validateLimitAndOffset(final RelNode topRel, final 
boolean limitOk)
+  private static void validateLimitAndOffset(final RelNode rootRel, final 
boolean limitOk)
   {
     Sort sort = null;
 
-    if (topRel instanceof Sort) {
-      sort = (Sort) topRel;
-    } else if (topRel instanceof Project) {
+    if (rootRel instanceof Sort) {
+      sort = (Sort) rootRel;
+    } else if (rootRel instanceof Project) {
       // Look for Project after a Sort, then validate the sort.
-      final Project project = (Project) topRel;
+      final Project project = (Project) rootRel;
       if (project.isMapping()) {
         final RelNode projectInput = project.getInput();
         if (projectInput instanceof Sort) {
@@ -307,6 +337,132 @@ public class MSQTaskSqlEngine implements SqlEngine
     }
   }
 
+  /**
+   * Validate that the query does not include any type changes from string to 
array or vice versa.
+   *
+   * These type changes tend to cause problems due to mixing of multi-value 
strings and string arrays. In particular,
+   * many queries written in the "classic MVD" style (treating MVDs as if they 
were regular strings) will fail when
+   * MVDs and arrays are mixed. So, we detect them as invalid.
+   *
+   * @param rootRel        root rel
+   * @param fieldMappings  field mappings from {@link #validateInsert(RelNode, 
List, Table, PlannerContext)}
+   * @param targetTable    table we are inserting (or replacing) into, if any
+   * @param plannerContext planner context
+   */
+  private static void validateTypeChanges(
+      final RelNode rootRel,
+      final List<Pair<Integer, String>> fieldMappings,
+      @Nullable final Table targetTable,
+      final PlannerContext plannerContext
+  )
+  {
+    if (targetTable == null) {
+      return;
+    }
+
+    final Set<String> columnsExcludedFromTypeVerification =
+        
MultiStageQueryContext.getColumnsExcludedFromTypeVerification(plannerContext.queryContext());
+    final ArrayIngestMode arrayIngestMode = 
MultiStageQueryContext.getArrayIngestMode(plannerContext.queryContext());
+
+    for (Pair<Integer, String> fieldMapping : fieldMappings) {
+      final int columnIndex = fieldMapping.left;
+      final String columnName = fieldMapping.right;
+      final RelDataTypeField oldSqlTypeField =
+          
targetTable.getRowType(DruidTypeSystem.TYPE_FACTORY).getField(columnName, true, 
false);
+
+      if (!columnsExcludedFromTypeVerification.contains(columnName) && 
oldSqlTypeField != null) {
+        final ColumnType oldDruidType = 
Calcites.getColumnTypeForRelDataType(oldSqlTypeField.getType());
+        final RelDataType newSqlType = 
rootRel.getRowType().getFieldList().get(columnIndex).getType();
+        final ColumnType newDruidType =
+            
DimensionSchemaUtils.getDimensionType(Calcites.getColumnTypeForRelDataType(newSqlType),
 arrayIngestMode);
+
+        if (newDruidType.isArray() && oldDruidType.is(ValueType.STRING)
+            || (newDruidType.is(ValueType.STRING) && oldDruidType.isArray())) {
+          final StringBuilder messageBuilder = new StringBuilder(
+              StringUtils.format(
+                  "Cannot write into field[%s] using type[%s] and 
arrayIngestMode[%s], since the existing type is[%s]",
+                  columnName,
+                  newSqlType,
+                  StringUtils.toLowerCase(arrayIngestMode.toString()),
+                  oldSqlTypeField.getType()
+              )
+          );
+
+          if (newDruidType.is(ValueType.STRING)
+              && newSqlType.getSqlTypeName() == SqlTypeName.ARRAY
+              && arrayIngestMode == ArrayIngestMode.MVD) {
+            // Tried to insert a SQL ARRAY, which got turned into a STRING by 
arrayIngestMode: mvd.
+            messageBuilder.append(". Try setting arrayIngestMode to[array] to 
retain the SQL type[")
+                          .append(newSqlType)
+                          .append("]");
+          } else if (newDruidType.is(ValueType.ARRAY)
+                     && oldDruidType.is(ValueType.STRING)
+                     && arrayIngestMode == ArrayIngestMode.ARRAY) {
+            // Tried to insert a SQL ARRAY, which stayed an ARRAY, but wasn't 
compatible with existing STRING.
+            messageBuilder.append(". Try wrapping this field using 
ARRAY_TO_MV(...) AS ")
+                          
.append(CalciteSqlDialect.DEFAULT.quoteIdentifier(columnName));
+          } else if (newDruidType.is(ValueType.STRING) && 
oldDruidType.is(ValueType.ARRAY)) {
+            // Tried to insert a SQL VARCHAR, but wasn't compatible with 
existing ARRAY.
+            messageBuilder.append(". Try");
+            if (arrayIngestMode == ArrayIngestMode.MVD) {
+              messageBuilder.append(" setting arrayIngestMode to[array] and");
+            }
+            messageBuilder.append(" adjusting your query to make this column 
an ARRAY instead of VARCHAR");
+          }
+
+          messageBuilder.append(". See 
https://druid.apache.org/docs/latest/querying/arrays#arrayingestmode "
+                                + "for more details about this check and how 
to override it if needed.");
+
+          throw 
InvalidSqlInput.exception(StringUtils.encodeForFormat(messageBuilder.toString()));
+        }
+      }
+    }
+  }
+
+  /**
+   * Returns the index of {@link ColumnHolder#TIME_COLUMN_NAME} within a list 
of field mappings from
+   * {@link #validateInsert(RelNode, List, Table, PlannerContext)}.
+   *
+   * Returns -1 if the list does not contain a time column.
+   */
+  private static int getTimeColumnIndex(final List<Pair<Integer, String>> 
fieldMappings)
+  {
+    for (final Pair<Integer, String> field : fieldMappings) {
+      if (field.right.equals(ColumnHolder.TIME_COLUMN_NAME)) {
+        return field.left;
+      }
+    }
+
+    return -1;
+  }
+
+  /**
+   * Retrieve the segment granularity for a query.
+   */
+  private static Granularity getSegmentGranularity(final PlannerContext 
plannerContext)
+  {
+    try {
+      return QueryKitUtils.getSegmentGranularityFromContext(
+          plannerContext.getJsonMapper(),
+          plannerContext.queryContextMap()
+      );
+    }
+    catch (Exception e) {
+      // This is a defensive check as the 
DruidSqlInsert.SQL_INSERT_SEGMENT_GRANULARITY in the query context is
+      // populated by Druid. If the user entered an incorrect granularity, 
that should have been flagged before reaching
+      // here.
+      throw DruidException.forPersona(DruidException.Persona.DEVELOPER)
+                          .ofCategory(DruidException.Category.DEFENSIVE)
+                          .build(
+                              e,
+                              "[%s] is not a valid value for [%s]",
+                              
plannerContext.queryContext().get(DruidSqlInsert.SQL_INSERT_SEGMENT_GRANULARITY),
+                              DruidSqlInsert.SQL_INSERT_SEGMENT_GRANULARITY
+                          );
+
+    }
+  }
+
   private static RelDataType getMSQStructType(RelDataTypeFactory typeFactory)
   {
     return typeFactory.createStructType(
diff --git 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/DimensionSchemaUtils.java
 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/DimensionSchemaUtils.java
index 748d411c97c..07a51821616 100644
--- 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/DimensionSchemaUtils.java
+++ 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/DimensionSchemaUtils.java
@@ -57,9 +57,19 @@ public class DimensionSchemaUtils
     );
   }
 
+  /**
+   * Create a dimension schema for a dimension column, given the type that it 
was assigned in the query, and the
+   * current values of {@link MultiStageQueryContext#CTX_USE_AUTO_SCHEMAS} and
+   * {@link MultiStageQueryContext#CTX_ARRAY_INGEST_MODE}.
+   *
+   * @param column          column name
+   * @param queryType       type of the column from the query
+   * @param useAutoType     active value of {@link 
MultiStageQueryContext#CTX_USE_AUTO_SCHEMAS}
+   * @param arrayIngestMode active value of {@link 
MultiStageQueryContext#CTX_ARRAY_INGEST_MODE}
+   */
   public static DimensionSchema createDimensionSchema(
       final String column,
-      @Nullable final ColumnType type,
+      @Nullable final ColumnType queryType,
       boolean useAutoType,
       ArrayIngestMode arrayIngestMode
   )
@@ -67,66 +77,92 @@ public class DimensionSchemaUtils
     if (useAutoType) {
       // for complex types that are not COMPLEX<json>, we still want to use 
the handler since 'auto' typing
       // only works for the 'standard' built-in types
-      if (type != null && type.is(ValueType.COMPLEX) && 
!ColumnType.NESTED_DATA.equals(type)) {
-        final ColumnCapabilities capabilities = 
ColumnCapabilitiesImpl.createDefault().setType(type);
+      if (queryType != null && queryType.is(ValueType.COMPLEX) && 
!ColumnType.NESTED_DATA.equals(queryType)) {
+        final ColumnCapabilities capabilities = 
ColumnCapabilitiesImpl.createDefault().setType(queryType);
         return DimensionHandlerUtils.getHandlerFromCapabilities(column, 
capabilities, null)
                                     .getDimensionSchema(capabilities);
       }
 
-      if (type != null && (type.isPrimitive() || type.isPrimitiveArray())) {
-        return new AutoTypeColumnSchema(column, type);
+      if (queryType != null && (queryType.isPrimitive() || 
queryType.isPrimitiveArray())) {
+        return new AutoTypeColumnSchema(column, queryType);
       }
       return new AutoTypeColumnSchema(column, null);
     } else {
-      // if schema information is not available, create a string dimension
-      if (type == null) {
-        return new StringDimensionSchema(column);
-      } else if (type.getType() == ValueType.STRING) {
-        return new StringDimensionSchema(column);
-      } else if (type.getType() == ValueType.LONG) {
+      // dimensionType may not be identical to queryType, depending on 
arrayIngestMode.
+      final ColumnType dimensionType = getDimensionType(queryType, 
arrayIngestMode);
+
+      if (dimensionType.getType() == ValueType.STRING) {
+        return new StringDimensionSchema(
+            column,
+            queryType != null && queryType.isArray()
+            ? DimensionSchema.MultiValueHandling.ARRAY
+            : DimensionSchema.MultiValueHandling.SORTED_ARRAY,
+            null
+        );
+      } else if (dimensionType.getType() == ValueType.LONG) {
         return new LongDimensionSchema(column);
-      } else if (type.getType() == ValueType.FLOAT) {
+      } else if (dimensionType.getType() == ValueType.FLOAT) {
         return new FloatDimensionSchema(column);
-      } else if (type.getType() == ValueType.DOUBLE) {
+      } else if (dimensionType.getType() == ValueType.DOUBLE) {
         return new DoubleDimensionSchema(column);
-      } else if (type.getType() == ValueType.ARRAY) {
-        ValueType elementType = type.getElementType().getType();
-        if (elementType == ValueType.STRING) {
-          if (arrayIngestMode == ArrayIngestMode.NONE) {
-            throw InvalidInput.exception(
-                "String arrays can not be ingested when '%s' is set to '%s'. 
Set '%s' in query context "
-                + "to 'array' to ingest the string array as an array, or 
ingest it as an MVD by explicitly casting the "
-                + "array to an MVD with ARRAY_TO_MV function.",
-                MultiStageQueryContext.CTX_ARRAY_INGEST_MODE,
-                StringUtils.toLowerCase(arrayIngestMode.name()),
-                MultiStageQueryContext.CTX_ARRAY_INGEST_MODE
-            );
-          } else if (arrayIngestMode == ArrayIngestMode.MVD) {
-            return new StringDimensionSchema(column, 
DimensionSchema.MultiValueHandling.ARRAY, null);
-          } else {
-            // arrayIngestMode == ArrayIngestMode.ARRAY would be true
-            return new AutoTypeColumnSchema(column, type);
-          }
-        } else if (elementType.isNumeric()) {
-          // ValueType == LONG || ValueType == FLOAT || ValueType == DOUBLE
-          if (arrayIngestMode == ArrayIngestMode.ARRAY) {
-            return new AutoTypeColumnSchema(column, type);
-          } else {
-            throw InvalidInput.exception(
-                "Numeric arrays can only be ingested when '%s' is set to 
'array' in the MSQ query's context. "
-                + "Current value of the parameter [%s]",
-                MultiStageQueryContext.CTX_ARRAY_INGEST_MODE,
-                StringUtils.toLowerCase(arrayIngestMode.name())
-            );
-          }
-        } else {
-          throw new ISE("Cannot create dimension for type [%s]", 
type.toString());
-        }
+      } else if (dimensionType.getType() == ValueType.ARRAY) {
+        return new AutoTypeColumnSchema(column, dimensionType);
       } else {
-        final ColumnCapabilities capabilities = 
ColumnCapabilitiesImpl.createDefault().setType(type);
+        final ColumnCapabilities capabilities = 
ColumnCapabilitiesImpl.createDefault().setType(dimensionType);
         return DimensionHandlerUtils.getHandlerFromCapabilities(column, 
capabilities, null)
                                     .getDimensionSchema(capabilities);
       }
     }
   }
+
+  /**
+   * Based on a type from a query result, get the type of dimension we should 
write.
+   *
+   * @throws org.apache.druid.error.DruidException if there is some problem
+   */
+  public static ColumnType getDimensionType(
+      @Nullable final ColumnType queryType,
+      final ArrayIngestMode arrayIngestMode
+  )
+  {
+    if (queryType == null) {
+      // if schema information is not available, create a string dimension
+      return ColumnType.STRING;
+    } else if (queryType.getType() == ValueType.ARRAY) {
+      ValueType elementType = queryType.getElementType().getType();
+      if (elementType == ValueType.STRING) {
+        if (arrayIngestMode == ArrayIngestMode.NONE) {
+          throw InvalidInput.exception(
+              "String arrays can not be ingested when '%s' is set to '%s'. Set 
'%s' in query context "
+              + "to 'array' to ingest the string array as an array, or ingest 
it as an MVD by explicitly casting the "
+              + "array to an MVD with the ARRAY_TO_MV function.",
+              MultiStageQueryContext.CTX_ARRAY_INGEST_MODE,
+              StringUtils.toLowerCase(arrayIngestMode.name()),
+              MultiStageQueryContext.CTX_ARRAY_INGEST_MODE
+          );
+        } else if (arrayIngestMode == ArrayIngestMode.MVD) {
+          return ColumnType.STRING;
+        } else {
+          assert arrayIngestMode == ArrayIngestMode.ARRAY;
+          return queryType;
+        }
+      } else if (elementType.isNumeric()) {
+        // ValueType == LONG || ValueType == FLOAT || ValueType == DOUBLE
+        if (arrayIngestMode == ArrayIngestMode.ARRAY) {
+          return queryType;
+        } else {
+          throw InvalidInput.exception(
+              "Numeric arrays can only be ingested when '%s' is set to 
'array'. "
+              + "Current value of the parameter is[%s]",
+              MultiStageQueryContext.CTX_ARRAY_INGEST_MODE,
+              StringUtils.toLowerCase(arrayIngestMode.name())
+          );
+        }
+      } else {
+        throw new ISE("Cannot create dimension for type[%s]", 
queryType.toString());
+      }
+    } else {
+      return queryType;
+    }
+  }
 }
diff --git 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/MultiStageQueryContext.java
 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/MultiStageQueryContext.java
index b7340343c81..3cb49b7d05e 100644
--- 
a/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/MultiStageQueryContext.java
+++ 
b/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/util/MultiStageQueryContext.java
@@ -43,7 +43,9 @@ import javax.annotation.Nullable;
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.Collections;
+import java.util.HashSet;
 import java.util.List;
+import java.util.Set;
 import java.util.regex.Pattern;
 import java.util.stream.Collectors;
 
@@ -152,6 +154,7 @@ public class MultiStageQueryContext
   public static final String CTX_ARRAY_INGEST_MODE = "arrayIngestMode";
   public static final ArrayIngestMode DEFAULT_ARRAY_INGEST_MODE = 
ArrayIngestMode.MVD;
 
+  public static final String CTX_SKIP_TYPE_VERIFICATION = 
"skipTypeVerification";
 
   private static final Pattern LOOKS_LIKE_JSON_ARRAY = 
Pattern.compile("^\\s*\\[.*", Pattern.DOTALL);
 
@@ -297,7 +300,7 @@ public class MultiStageQueryContext
 
   public static List<String> getSortOrder(final QueryContext queryContext)
   {
-    return 
MultiStageQueryContext.decodeSortOrder(queryContext.getString(CTX_SORT_ORDER));
+    return decodeList(CTX_SORT_ORDER, queryContext.getString(CTX_SORT_ORDER));
   }
 
   @Nullable
@@ -316,37 +319,39 @@ public class MultiStageQueryContext
     return queryContext.getEnum(CTX_ARRAY_INGEST_MODE, ArrayIngestMode.class, 
DEFAULT_ARRAY_INGEST_MODE);
   }
 
+  public static Set<String> getColumnsExcludedFromTypeVerification(final 
QueryContext queryContext)
+  {
+    return new HashSet<>(decodeList(CTX_SKIP_TYPE_VERIFICATION, 
queryContext.getString(CTX_SKIP_TYPE_VERIFICATION)));
+  }
+
   /**
-   * Decodes {@link #CTX_SORT_ORDER} from either a JSON or CSV string.
+   * Decodes a list from either a JSON or CSV string.
    */
-  @Nullable
   @VisibleForTesting
-  static List<String> decodeSortOrder(@Nullable final String sortOrderString)
+  static List<String> decodeList(final String keyName, @Nullable final String 
listString)
   {
-    if (sortOrderString == null) {
+    if (listString == null) {
       return Collections.emptyList();
-    } else if (LOOKS_LIKE_JSON_ARRAY.matcher(sortOrderString).matches()) {
+    } else if (LOOKS_LIKE_JSON_ARRAY.matcher(listString).matches()) {
       try {
         // Not caching this ObjectMapper in a static, because we expect to use 
it infrequently (once per INSERT
         // query that uses this feature) and there is no need to keep it 
around longer than that.
-        return new ObjectMapper().readValue(sortOrderString, new 
TypeReference<List<String>>()
-        {
-        });
+        return new ObjectMapper().readValue(listString, new 
TypeReference<List<String>>() {});
       }
       catch (JsonProcessingException e) {
-        throw QueryContexts.badValueException(CTX_SORT_ORDER, "CSV or JSON 
array", sortOrderString);
+        throw QueryContexts.badValueException(keyName, "CSV or JSON array", 
listString);
       }
     } else {
       final RFC4180Parser csvParser = new 
RFC4180ParserBuilder().withSeparator(',').build();
 
       try {
-        return Arrays.stream(csvParser.parseLine(sortOrderString))
+        return Arrays.stream(csvParser.parseLine(listString))
                      .filter(s -> s != null && !s.isEmpty())
                      .map(String::trim)
                      .collect(Collectors.toList());
       }
       catch (IOException e) {
-        throw QueryContexts.badValueException(CTX_SORT_ORDER, "CSV or JSON 
array", sortOrderString);
+        throw QueryContexts.badValueException(keyName, "CSV or JSON array", 
listString);
       }
     }
   }
diff --git 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQArraysTest.java
 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQArraysTest.java
index 456c74c29bc..cbf2d68a285 100644
--- 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQArraysTest.java
+++ 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQArraysTest.java
@@ -21,10 +21,12 @@ package org.apache.druid.msq.exec;
 
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableSet;
+import org.apache.druid.common.config.NullHandling;
 import org.apache.druid.data.input.impl.InlineInputSource;
 import org.apache.druid.data.input.impl.JsonInputFormat;
 import org.apache.druid.data.input.impl.LocalInputSource;
 import org.apache.druid.data.input.impl.systemfield.SystemFields;
+import org.apache.druid.error.DruidException;
 import org.apache.druid.java.util.common.ISE;
 import org.apache.druid.java.util.common.Intervals;
 import org.apache.druid.msq.indexing.MSQSpec;
@@ -111,14 +113,14 @@ public class MSQArraysTest extends MSQTestBase
     dataFileNameJsonString = 
queryFramework().queryJsonMapper().writeValueAsString(dataFile);
 
     RowSignature dataFileSignature = RowSignature.builder()
-                                             .add("timestamp", 
ColumnType.STRING)
-                                             .add("arrayString", 
ColumnType.STRING_ARRAY)
-                                             .add("arrayStringNulls", 
ColumnType.STRING_ARRAY)
-                                             .add("arrayLong", 
ColumnType.LONG_ARRAY)
-                                             .add("arrayLongNulls", 
ColumnType.LONG_ARRAY)
-                                             .add("arrayDouble", 
ColumnType.DOUBLE_ARRAY)
-                                             .add("arrayDoubleNulls", 
ColumnType.DOUBLE_ARRAY)
-                                             .build();
+                                                 .add("timestamp", 
ColumnType.STRING)
+                                                 .add("arrayString", 
ColumnType.STRING_ARRAY)
+                                                 .add("arrayStringNulls", 
ColumnType.STRING_ARRAY)
+                                                 .add("arrayLong", 
ColumnType.LONG_ARRAY)
+                                                 .add("arrayLongNulls", 
ColumnType.LONG_ARRAY)
+                                                 .add("arrayDouble", 
ColumnType.DOUBLE_ARRAY)
+                                                 .add("arrayDoubleNulls", 
ColumnType.DOUBLE_ARRAY)
+                                                 .build();
     dataFileSignatureJsonString = 
queryFramework().queryJsonMapper().writeValueAsString(dataFileSignature);
 
     dataFileExternalDataSource = new ExternalDataSource(
@@ -150,6 +152,171 @@ public class MSQArraysTest extends MSQTestBase
                      .verifyExecutionError();
   }
 
+  /**
+   * Tests the behaviour of INSERT query when arrayIngestMode is set to none 
(default) and the user tries to ingest
+   * string arrays
+   */
+  @Test
+  public void testReplaceMvdWithStringArray()
+  {
+    final Map<String, Object> adjustedContext = new HashMap<>(context);
+    adjustedContext.put(MultiStageQueryContext.CTX_ARRAY_INGEST_MODE, "array");
+
+    testIngestQuery()
+        .setSql(
+            "REPLACE INTO foo OVERWRITE ALL\n"
+            + "SELECT MV_TO_ARRAY(dim3) AS dim3 FROM foo\n"
+            + "PARTITIONED BY ALL TIME"
+        )
+        .setQueryContext(adjustedContext)
+        .setExpectedExecutionErrorMatcher(CoreMatchers.allOf(
+            CoreMatchers.instanceOf(DruidException.class),
+            ThrowableMessageMatcher.hasMessage(CoreMatchers.startsWith(
+                "Cannot write into field[dim3] using type[VARCHAR ARRAY] and 
arrayIngestMode[array], "
+                + "since the existing type is[VARCHAR]"))
+        ))
+        .verifyExecutionError();
+  }
+
+  /**
+   * Tests the behaviour of INSERT query when arrayIngestMode is set to none 
(default) and the user tries to ingest
+   * string arrays
+   */
+  @Test
+  public void testReplaceStringArrayWithMvdInArrayMode()
+  {
+    final Map<String, Object> adjustedContext = new HashMap<>(context);
+    adjustedContext.put(MultiStageQueryContext.CTX_ARRAY_INGEST_MODE, "array");
+
+    testIngestQuery()
+        .setSql(
+            "REPLACE INTO arrays OVERWRITE ALL\n"
+            + "SELECT ARRAY_TO_MV(arrayString) AS arrayString FROM arrays\n"
+            + "PARTITIONED BY ALL TIME"
+        )
+        .setQueryContext(adjustedContext)
+        .setExpectedExecutionErrorMatcher(CoreMatchers.allOf(
+            CoreMatchers.instanceOf(DruidException.class),
+            ThrowableMessageMatcher.hasMessage(CoreMatchers.startsWith(
+                "Cannot write into field[arrayString] using type[VARCHAR] and 
arrayIngestMode[array], since the "
+                + "existing type is[VARCHAR ARRAY]. Try adjusting your query 
to make this column an ARRAY instead "
+                + "of VARCHAR."))
+        ))
+        .verifyExecutionError();
+  }
+
+  /**
+   * Tests the behaviour of INSERT query when arrayIngestMode is set to none 
(default) and the user tries to ingest
+   * string arrays
+   */
+  @Test
+  public void testReplaceStringArrayWithMvdInMvdMode()
+  {
+    final Map<String, Object> adjustedContext = new HashMap<>(context);
+    adjustedContext.put(MultiStageQueryContext.CTX_ARRAY_INGEST_MODE, "mvd");
+
+    testIngestQuery()
+        .setSql(
+            "REPLACE INTO arrays OVERWRITE ALL\n"
+            + "SELECT ARRAY_TO_MV(arrayString) AS arrayString FROM arrays\n"
+            + "PARTITIONED BY ALL TIME"
+        )
+        .setQueryContext(adjustedContext)
+        .setExpectedExecutionErrorMatcher(CoreMatchers.allOf(
+            CoreMatchers.instanceOf(DruidException.class),
+            ThrowableMessageMatcher.hasMessage(CoreMatchers.startsWith(
+                "Cannot write into field[arrayString] using type[VARCHAR] and 
arrayIngestMode[mvd], since the "
+                + "existing type is[VARCHAR ARRAY]. Try setting 
arrayIngestMode to[array] and adjusting your query to "
+                + "make this column an ARRAY instead of VARCHAR."))
+        ))
+        .verifyExecutionError();
+  }
+
+  /**
+   * Tests the behaviour of INSERT query when arrayIngestMode is set to none 
(default) and the user tries to ingest
+   * string arrays
+   */
+  @Test
+  public void testReplaceMvdWithStringArraySkipValidation()
+  {
+    final Map<String, Object> adjustedContext = new HashMap<>(context);
+    adjustedContext.put(MultiStageQueryContext.CTX_ARRAY_INGEST_MODE, "array");
+    adjustedContext.put(MultiStageQueryContext.CTX_SKIP_TYPE_VERIFICATION, 
"dim3");
+
+    RowSignature rowSignature = RowSignature.builder()
+                                            .add("__time", ColumnType.LONG)
+                                            .add("dim3", 
ColumnType.STRING_ARRAY)
+                                            .build();
+
+    testIngestQuery()
+        .setSql(
+            "REPLACE INTO foo OVERWRITE ALL\n"
+            + "SELECT MV_TO_ARRAY(dim3) AS dim3 FROM foo\n"
+            + "PARTITIONED BY ALL TIME"
+        )
+        .setQueryContext(adjustedContext)
+        .setExpectedDataSource("foo")
+        .setExpectedRowSignature(rowSignature)
+        .setExpectedSegment(ImmutableSet.of(SegmentId.of("foo", 
Intervals.ETERNITY, "test", 0)))
+        .setExpectedResultRows(
+            NullHandling.sqlCompatible()
+            ? ImmutableList.of(
+                new Object[]{0L, null},
+                new Object[]{0L, null},
+                new Object[]{0L, new Object[]{"a", "b"}},
+                new Object[]{0L, new Object[]{""}},
+                new Object[]{0L, new Object[]{"b", "c"}},
+                new Object[]{0L, new Object[]{"d"}}
+            )
+            : ImmutableList.of(
+                new Object[]{0L, null},
+                new Object[]{0L, null},
+                new Object[]{0L, null},
+                new Object[]{0L, new Object[]{"a", "b"}},
+                new Object[]{0L, new Object[]{"b", "c"}},
+                new Object[]{0L, new Object[]{"d"}}
+            )
+        )
+        .verifyResults();
+  }
+
+  /**
+   * Tests the behaviour of INSERT query when arrayIngestMode is set to none 
(default) and the user tries to ingest
+   * string arrays
+   */
+  @Test
+  public void testReplaceMvdWithMvd()
+  {
+    final Map<String, Object> adjustedContext = new HashMap<>(context);
+    adjustedContext.put(MultiStageQueryContext.CTX_ARRAY_INGEST_MODE, "array");
+
+    RowSignature rowSignature = RowSignature.builder()
+                                            .add("__time", ColumnType.LONG)
+                                            .add("dim3", ColumnType.STRING)
+                                            .build();
+
+    testIngestQuery()
+        .setSql(
+            "REPLACE INTO foo OVERWRITE ALL\n"
+            + "SELECT dim3 FROM foo\n"
+            + "PARTITIONED BY ALL TIME"
+        )
+        .setQueryContext(adjustedContext)
+        .setExpectedDataSource("foo")
+        .setExpectedRowSignature(rowSignature)
+        .setExpectedSegment(ImmutableSet.of(SegmentId.of("foo", 
Intervals.ETERNITY, "test", 0)))
+        .setExpectedResultRows(
+            ImmutableList.of(
+                new Object[]{0L, null},
+                new Object[]{0L, null},
+                new Object[]{0L, NullHandling.sqlCompatible() ? "" : null},
+                new Object[]{0L, ImmutableList.of("a", "b")},
+                new Object[]{0L, ImmutableList.of("b", "c")},
+                new Object[]{0L, "d"}
+            )
+        )
+        .verifyResults();
+  }
 
   /**
    * Tests the behaviour of INSERT query when arrayIngestMode is set to mvd 
(default) and the only array type to be
@@ -475,7 +642,7 @@ public class MSQArraysTest extends MSQTestBase
             null,
             Arrays.asList(3.3d, 4.4d, 5.5d),
             Arrays.asList(999.0d, null, 5.5d),
-        },
+            },
         new Object[]{
             1672531200000L,
             Arrays.asList("b", "c"),
@@ -583,7 +750,7 @@ public class MSQArraysTest extends MSQTestBase
             Arrays.asList(2L, 3L),
             null,
             Arrays.asList(null, 1.1d),
-        }
+            }
     );
 
     RowSignature rowSignatureWithoutTimeColumn =
diff --git 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQInsertTest.java
 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQInsertTest.java
index ebee2e042a2..6ce1785672f 100644
--- 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQInsertTest.java
+++ 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQInsertTest.java
@@ -1215,7 +1215,7 @@ public class MSQInsertTest extends MSQTestBase
                 DruidException.Persona.USER,
                 DruidException.Category.INVALID_INPUT,
                 "invalidInput"
-            ).expectMessageIs("Field [__time] was the wrong type [VARCHAR], 
expected TIMESTAMP")
+            ).expectMessageIs("Field[__time] was the wrong type[VARCHAR], 
expected TIMESTAMP")
         )
         .verifyPlanningErrors();
   }
diff --git 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/CalciteMSQTestsHelper.java
 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/CalciteMSQTestsHelper.java
index f6297b28c64..701033c49d5 100644
--- 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/CalciteMSQTestsHelper.java
+++ 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/test/CalciteMSQTestsHelper.java
@@ -76,7 +76,6 @@ import 
org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFacto
 import org.apache.druid.server.SegmentManager;
 import org.apache.druid.server.coordination.DataSegmentAnnouncer;
 import org.apache.druid.server.coordination.NoopDataSegmentAnnouncer;
-import org.apache.druid.sql.calcite.CalciteArraysQueryTest;
 import org.apache.druid.sql.calcite.util.CalciteTests;
 import org.apache.druid.sql.calcite.util.TestDataBuilder;
 import org.apache.druid.timeline.DataSegment;
@@ -92,6 +91,7 @@ import java.util.List;
 import java.util.Set;
 import java.util.function.Supplier;
 
+import static org.apache.druid.sql.calcite.util.CalciteTests.ARRAYS_DATASOURCE;
 import static org.apache.druid.sql.calcite.util.CalciteTests.DATASOURCE1;
 import static org.apache.druid.sql.calcite.util.CalciteTests.DATASOURCE2;
 import static org.apache.druid.sql.calcite.util.CalciteTests.DATASOURCE3;
@@ -287,7 +287,7 @@ public class CalciteMSQTestsHelper
               .rows(ROWS_LOTS_OF_COLUMNS)
               .buildMMappedIndex();
           break;
-        case CalciteArraysQueryTest.DATA_SOURCE_ARRAYS:
+        case ARRAYS_DATASOURCE:
           index = IndexBuilder.create()
                               .tmpDir(temporaryFolder.newFolder())
                               
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
diff --git 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/DimensionSchemaUtilsTest.java
 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/DimensionSchemaUtilsTest.java
index a82f5a35f9c..0a4e3ddbd81 100644
--- 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/DimensionSchemaUtilsTest.java
+++ 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/DimensionSchemaUtilsTest.java
@@ -179,19 +179,19 @@ public class DimensionSchemaUtilsTest
         DruidException.class,
         () -> DimensionSchemaUtils.createDimensionSchema("x", 
ColumnType.LONG_ARRAY, false, ArrayIngestMode.MVD)
     );
-    Assert.assertEquals("Numeric arrays can only be ingested when 
'arrayIngestMode' is set to 'array' in the MSQ query's context. Current value 
of the parameter [mvd]", t.getMessage());
+    Assert.assertEquals("Numeric arrays can only be ingested when 
'arrayIngestMode' is set to 'array'. Current value of the parameter is[mvd]", 
t.getMessage());
 
     t = Assert.assertThrows(
         DruidException.class,
         () -> DimensionSchemaUtils.createDimensionSchema("x", 
ColumnType.DOUBLE_ARRAY, false, ArrayIngestMode.MVD)
     );
-    Assert.assertEquals("Numeric arrays can only be ingested when 
'arrayIngestMode' is set to 'array' in the MSQ query's context. Current value 
of the parameter [mvd]", t.getMessage());
+    Assert.assertEquals("Numeric arrays can only be ingested when 
'arrayIngestMode' is set to 'array'. Current value of the parameter is[mvd]", 
t.getMessage());
 
     t = Assert.assertThrows(
         DruidException.class,
         () -> DimensionSchemaUtils.createDimensionSchema("x", 
ColumnType.FLOAT_ARRAY, false, ArrayIngestMode.MVD)
     );
-    Assert.assertEquals("Numeric arrays can only be ingested when 
'arrayIngestMode' is set to 'array' in the MSQ query's context. Current value 
of the parameter [mvd]", t.getMessage());
+    Assert.assertEquals("Numeric arrays can only be ingested when 
'arrayIngestMode' is set to 'array'. Current value of the parameter is[mvd]", 
t.getMessage());
   }
 
   @Test
diff --git 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/MultiStageQueryContextTest.java
 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/MultiStageQueryContextTest.java
index c2b401c70db..9f24a8b4331 100644
--- 
a/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/MultiStageQueryContextTest.java
+++ 
b/extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/util/MultiStageQueryContextTest.java
@@ -314,7 +314,7 @@ public class MultiStageQueryContextTest
 
   private static List<String> decodeSortOrder(@Nullable final String input)
   {
-    return MultiStageQueryContext.decodeSortOrder(input);
+    return 
MultiStageQueryContext.decodeList(MultiStageQueryContext.CTX_SORT_ORDER, input);
   }
 
   private static IndexSpec decodeIndexSpec(@Nullable final Object 
inputSpecObject)
diff --git 
a/sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java 
b/sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java
index 193ba0c7f1b..282c70d5772 100644
--- 
a/sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java
+++ 
b/sql/src/test/java/org/apache/druid/sql/avatica/DruidAvaticaHandlerTest.java
@@ -524,6 +524,12 @@ public class DruidAvaticaHandlerTest extends 
CalciteTestBase
     final DatabaseMetaData metaData = client.getMetaData();
     Assert.assertEquals(
         ImmutableList.of(
+            row(
+                Pair.of("TABLE_CAT", "druid"),
+                Pair.of("TABLE_NAME", CalciteTests.ARRAYS_DATASOURCE),
+                Pair.of("TABLE_SCHEM", "druid"),
+                Pair.of("TABLE_TYPE", "TABLE")
+            ),
             row(
                 Pair.of("TABLE_CAT", "druid"),
                 Pair.of("TABLE_NAME", CalciteTests.BROADCAST_DATASOURCE),
@@ -605,6 +611,12 @@ public class DruidAvaticaHandlerTest extends 
CalciteTestBase
     final DatabaseMetaData metaData = superuserClient.getMetaData();
     Assert.assertEquals(
         ImmutableList.of(
+            row(
+                Pair.of("TABLE_CAT", "druid"),
+                Pair.of("TABLE_NAME", CalciteTests.ARRAYS_DATASOURCE),
+                Pair.of("TABLE_SCHEM", "druid"),
+                Pair.of("TABLE_TYPE", "TABLE")
+            ),
             row(
                 Pair.of("TABLE_CAT", "druid"),
                 Pair.of("TABLE_NAME", CalciteTests.BROADCAST_DATASOURCE),
diff --git 
a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java 
b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
index e4f609ef9ff..fa855e4279c 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java
@@ -22,9 +22,7 @@ package org.apache.druid.sql.calcite;
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.ImmutableSet;
-import com.google.inject.Injector;
 import org.apache.druid.common.config.NullHandling;
-import org.apache.druid.data.input.ResourceInputSource;
 import org.apache.druid.guice.DruidInjectorBuilder;
 import org.apache.druid.guice.NestedDataModule;
 import org.apache.druid.java.util.common.HumanReadableBytes;
@@ -32,17 +30,13 @@ import org.apache.druid.java.util.common.Intervals;
 import org.apache.druid.java.util.common.StringUtils;
 import org.apache.druid.java.util.common.granularity.Granularities;
 import org.apache.druid.math.expr.ExprMacroTable;
-import org.apache.druid.query.DataSource;
 import org.apache.druid.query.Druids;
 import org.apache.druid.query.FilteredDataSource;
-import org.apache.druid.query.FrameBasedInlineDataSource;
 import org.apache.druid.query.InlineDataSource;
 import org.apache.druid.query.LookupDataSource;
-import org.apache.druid.query.NestedDataTestUtils;
 import org.apache.druid.query.Query;
 import org.apache.druid.query.QueryContexts;
 import org.apache.druid.query.QueryDataSource;
-import org.apache.druid.query.QueryRunnerFactoryConglomerate;
 import org.apache.druid.query.TableDataSource;
 import org.apache.druid.query.UnnestDataSource;
 import org.apache.druid.query.aggregation.CountAggregatorFactory;
@@ -63,37 +57,20 @@ import 
org.apache.druid.query.groupby.having.DimFilterHavingSpec;
 import org.apache.druid.query.groupby.orderby.DefaultLimitSpec;
 import org.apache.druid.query.groupby.orderby.NoopLimitSpec;
 import org.apache.druid.query.groupby.orderby.OrderByColumnSpec;
-import org.apache.druid.query.lookup.LookupExtractorFactoryContainerProvider;
 import org.apache.druid.query.ordering.StringComparators;
 import org.apache.druid.query.scan.ScanQuery;
 import org.apache.druid.query.spec.MultipleIntervalSegmentSpec;
 import org.apache.druid.query.topn.DimensionTopNMetricSpec;
 import org.apache.druid.query.topn.TopNQueryBuilder;
-import org.apache.druid.segment.FrameBasedInlineSegmentWrangler;
-import org.apache.druid.segment.IndexBuilder;
-import org.apache.druid.segment.InlineSegmentWrangler;
-import org.apache.druid.segment.LookupSegmentWrangler;
-import org.apache.druid.segment.MapSegmentWrangler;
-import org.apache.druid.segment.QueryableIndex;
-import org.apache.druid.segment.SegmentWrangler;
 import org.apache.druid.segment.column.ColumnType;
 import org.apache.druid.segment.column.RowSignature;
-import org.apache.druid.segment.incremental.IncrementalIndexSchema;
 import org.apache.druid.segment.join.JoinType;
-import org.apache.druid.segment.join.JoinableFactoryWrapper;
 import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
-import 
org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory;
-import org.apache.druid.server.QueryStackTests;
-import org.apache.druid.server.SpecificSegmentsQuerySegmentWalker;
 import org.apache.druid.sql.calcite.filtration.Filtration;
 import org.apache.druid.sql.calcite.util.CalciteTests;
-import org.apache.druid.sql.calcite.util.TestDataBuilder;
-import org.apache.druid.timeline.DataSegment;
-import org.apache.druid.timeline.partition.LinearShardSpec;
 import org.junit.Assert;
 import org.junit.Test;
 
-import java.io.IOException;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
@@ -110,9 +87,6 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
                   .put(QueryContexts.CTX_SQL_STRINGIFY_ARRAYS, false)
                   .build();
 
-
-  public static final String DATA_SOURCE_ARRAYS = "arrays";
-
   public static void assertResultsDeepEquals(String sql, List<Object[]> 
expected, List<Object[]> results)
   {
     for (int row = 0; row < results.size(); row++) {
@@ -144,120 +118,6 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
     builder.addModule(new NestedDataModule());
   }
 
-  @SuppressWarnings("resource")
-  @Override
-  public SpecificSegmentsQuerySegmentWalker createQuerySegmentWalker(
-      final QueryRunnerFactoryConglomerate conglomerate,
-      final JoinableFactoryWrapper joinableFactory,
-      final Injector injector
-  ) throws IOException
-  {
-    NestedDataModule.registerHandlersAndSerde();
-
-    final QueryableIndex foo = IndexBuilder
-        .create()
-        .tmpDir(temporaryFolder.newFolder())
-        
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
-        .schema(TestDataBuilder.INDEX_SCHEMA)
-        .rows(TestDataBuilder.ROWS1)
-        .buildMMappedIndex();
-
-    final QueryableIndex numfoo = IndexBuilder
-        .create()
-        .tmpDir(temporaryFolder.newFolder())
-        
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
-        .schema(TestDataBuilder.INDEX_SCHEMA_NUMERIC_DIMS)
-        .rows(TestDataBuilder.ROWS1_WITH_NUMERIC_DIMS)
-        .buildMMappedIndex();
-
-    final QueryableIndex indexLotsOfColumns = IndexBuilder
-        .create()
-        .tmpDir(temporaryFolder.newFolder())
-        
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
-        .schema(TestDataBuilder.INDEX_SCHEMA_LOTS_O_COLUMNS)
-        .rows(TestDataBuilder.ROWS_LOTS_OF_COLUMNS)
-        .buildMMappedIndex();
-
-    final QueryableIndex indexArrays =
-        IndexBuilder.create()
-                    .tmpDir(temporaryFolder.newFolder())
-                    
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
-                    .schema(
-                        new IncrementalIndexSchema.Builder()
-                            
.withTimestampSpec(NestedDataTestUtils.AUTO_SCHEMA.getTimestampSpec())
-                            
.withDimensionsSpec(NestedDataTestUtils.AUTO_SCHEMA.getDimensionsSpec())
-                            .withMetrics(
-                                new CountAggregatorFactory("cnt")
-                            )
-                            .withRollup(false)
-                            .build()
-                    )
-                    .inputSource(
-                        ResourceInputSource.of(
-                            NestedDataTestUtils.class.getClassLoader(),
-                            NestedDataTestUtils.ARRAY_TYPES_DATA_FILE
-                        )
-                    )
-                    .inputFormat(TestDataBuilder.DEFAULT_JSON_INPUT_FORMAT)
-                    .inputTmpDir(temporaryFolder.newFolder())
-                    .buildMMappedIndex();
-
-    SpecificSegmentsQuerySegmentWalker walker = new 
SpecificSegmentsQuerySegmentWalker(
-        conglomerate,
-        new MapSegmentWrangler(
-            ImmutableMap.<Class<? extends DataSource>, 
SegmentWrangler>builder()
-                        .put(InlineDataSource.class, new 
InlineSegmentWrangler())
-                        .put(FrameBasedInlineDataSource.class, new 
FrameBasedInlineSegmentWrangler())
-                        .put(
-                            LookupDataSource.class,
-                            new 
LookupSegmentWrangler(injector.getInstance(LookupExtractorFactoryContainerProvider.class))
-                        )
-                        .build()
-        ),
-        joinableFactory,
-        QueryStackTests.DEFAULT_NOOP_SCHEDULER
-    );
-    walker.add(
-        DataSegment.builder()
-                   .dataSource(CalciteTests.DATASOURCE1)
-                   .interval(foo.getDataInterval())
-                   .version("1")
-                   .shardSpec(new LinearShardSpec(0))
-                   .size(0)
-                   .build(),
-        foo
-    ).add(
-        DataSegment.builder()
-                   .dataSource(CalciteTests.DATASOURCE3)
-                   .interval(numfoo.getDataInterval())
-                   .version("1")
-                   .shardSpec(new LinearShardSpec(0))
-                   .size(0)
-                   .build(),
-        numfoo
-    ).add(
-        DataSegment.builder()
-                   .dataSource(CalciteTests.DATASOURCE5)
-                   .interval(indexLotsOfColumns.getDataInterval())
-                   .version("1")
-                   .shardSpec(new LinearShardSpec(0))
-                   .size(0)
-                   .build(),
-        indexLotsOfColumns
-    ).add(
-        DataSegment.builder()
-                   .dataSource(DATA_SOURCE_ARRAYS)
-                   .version("1")
-                   .interval(indexArrays.getDataInterval())
-                   .shardSpec(new LinearShardSpec(1))
-                   .size(0)
-                   .build(),
-        indexArrays
-    );
-
-    return walker;
-  }
-
   // test some query stuffs, sort of limited since no native array column 
types so either need to use constructor or
   // array aggregator
   @Test
@@ -320,7 +180,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
         ImmutableList.of(
             GroupByQuery.builder()
-                        .setDataSource(DATA_SOURCE_ARRAYS)
+                        .setDataSource(CalciteTests.ARRAYS_DATASOURCE)
                         .setInterval(querySegmentSpec(Filtration.eternity()))
                         .setVirtualColumns(expressionVirtualColumn(
                             "v0",
@@ -645,7 +505,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         + " FROM druid.arrays",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .virtualColumns(
                     // these report as strings even though they are not, 
someday this will not be so
@@ -861,7 +721,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayStringNulls FROM druid.arrays WHERE 
ARRAY_OVERLAP(arrayStringNulls, ARRAY['a','b']) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     or(
@@ -892,7 +752,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayLongNulls FROM druid.arrays WHERE 
ARRAY_OVERLAP(arrayLongNulls, ARRAY[1, 2]) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     or(
@@ -923,7 +783,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayDoubleNulls FROM druid.arrays WHERE 
ARRAY_OVERLAP(arrayDoubleNulls, ARRAY[1.1, 2.2]) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     or(
@@ -1001,7 +861,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayStringNulls, arrayString FROM druid.arrays WHERE 
ARRAY_OVERLAP(arrayStringNulls, arrayString) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 
.filters(expressionFilter("array_overlap(\"arrayStringNulls\",\"arrayString\")"))
                 .columns("arrayString", "arrayStringNulls")
@@ -1027,7 +887,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayLongNulls, arrayLong FROM druid.arrays WHERE 
ARRAY_OVERLAP(arrayLongNulls, arrayLong) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 
.filters(expressionFilter("array_overlap(\"arrayLongNulls\",\"arrayLong\")"))
                 .columns("arrayLong", "arrayLongNulls")
@@ -1053,7 +913,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayDoubleNulls, arrayDouble FROM druid.arrays WHERE 
ARRAY_OVERLAP(arrayDoubleNulls, arrayDouble) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 
.filters(expressionFilter("array_overlap(\"arrayDoubleNulls\",\"arrayDouble\")"))
                 .columns("arrayDouble", "arrayDoubleNulls")
@@ -1105,7 +965,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayStringNulls FROM druid.arrays WHERE 
ARRAY_CONTAINS(arrayStringNulls, ARRAY['a','b']) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     and(
@@ -1135,7 +995,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayLongNulls FROM druid.arrays WHERE 
ARRAY_CONTAINS(arrayLongNulls, ARRAY[1, null]) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     and(
@@ -1163,7 +1023,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayDoubleNulls FROM druid.arrays WHERE 
ARRAY_CONTAINS(arrayDoubleNulls, ARRAY[1.1, null]) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     and(
@@ -1273,7 +1133,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayStringNulls, arrayString FROM druid.arrays WHERE 
ARRAY_CONTAINS(arrayStringNulls, arrayString) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     
expressionFilter("array_contains(\"arrayStringNulls\",\"arrayString\")")
@@ -1297,7 +1157,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayLong, arrayLongNulls FROM druid.arrays WHERE 
ARRAY_CONTAINS(arrayLong, arrayLongNulls) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     
expressionFilter("array_contains(\"arrayLong\",\"arrayLongNulls\")")
@@ -1324,7 +1184,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayDoubleNulls, arrayDouble FROM druid.arrays WHERE 
ARRAY_CONTAINS(arrayDoubleNulls, arrayDouble) LIMIT 5",
         ImmutableList.of(
             newScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .filters(
                     
expressionFilter("array_contains(\"arrayDoubleNulls\",\"arrayDouble\")")
@@ -1375,7 +1235,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
         ImmutableList.of(
             new Druids.ScanQueryBuilder()
-                .dataSource(DATA_SOURCE_ARRAYS)
+                .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                 .intervals(querySegmentSpec(Filtration.eternity()))
                 .virtualColumns(
                     expressionVirtualColumn("v0", 
"array_slice(\"arrayString\",1)", ColumnType.STRING_ARRAY),
@@ -1460,7 +1320,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT arrayStringNulls, ARRAY_LENGTH(arrayStringNulls), SUM(cnt) 
FROM druid.arrays GROUP BY 1, 2 ORDER BY 2 DESC",
         ImmutableList.of(
             GroupByQuery.builder()
-                        .setDataSource(DATA_SOURCE_ARRAYS)
+                        .setDataSource(CalciteTests.ARRAYS_DATASOURCE)
                         .setInterval(querySegmentSpec(Filtration.eternity()))
                         .setGranularity(Granularities.ALL)
                         .setVirtualColumns(expressionVirtualColumn("v0", 
"array_length(\"arrayStringNulls\")", ColumnType.LONG))
@@ -1840,7 +1700,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
         ImmutableList.of(
             GroupByQuery.builder()
-                        .setDataSource(DATA_SOURCE_ARRAYS)
+                        .setDataSource(CalciteTests.ARRAYS_DATASOURCE)
                         .setInterval(querySegmentSpec(Filtration.eternity()))
                         .setGranularity(Granularities.ALL)
                         .setDimensions(
@@ -1937,7 +1797,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         QUERY_CONTEXT_NO_STRINGIFY_ARRAY,
         ImmutableList.of(
             GroupByQuery.builder()
-                        .setDataSource(DATA_SOURCE_ARRAYS)
+                        .setDataSource(CalciteTests.ARRAYS_DATASOURCE)
                         .setInterval(querySegmentSpec(Filtration.eternity()))
                         .setGranularity(Granularities.ALL)
                         .setDimensions(
@@ -2981,7 +2841,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT ARRAY_AGG(arrayLongNulls), ARRAY_AGG(DISTINCT arrayDouble), 
ARRAY_AGG(DISTINCT arrayStringNulls) FILTER(WHERE arrayLong = ARRAY[2,3]) FROM 
arrays WHERE arrayDoubleNulls is not null",
         ImmutableList.of(
             Druids.newTimeseriesQueryBuilder()
-                  .dataSource(DATA_SOURCE_ARRAYS)
+                  .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                   .intervals(querySegmentSpec(Filtration.eternity()))
                   .granularity(Granularities.ALL)
                   .filters(notNull("arrayDoubleNulls"))
@@ -3065,7 +2925,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         "SELECT ARRAY_CONCAT_AGG(arrayLongNulls), ARRAY_CONCAT_AGG(DISTINCT 
arrayDouble), ARRAY_CONCAT_AGG(DISTINCT arrayStringNulls) FILTER(WHERE 
arrayLong = ARRAY[2,3]) FROM arrays WHERE arrayDoubleNulls is not null",
         ImmutableList.of(
             Druids.newTimeseriesQueryBuilder()
-                  .dataSource(DATA_SOURCE_ARRAYS)
+                  .dataSource(CalciteTests.ARRAYS_DATASOURCE)
                   .intervals(querySegmentSpec(Filtration.eternity()))
                   .granularity(Granularities.ALL)
                   .filters(notNull("arrayDoubleNulls"))
@@ -3923,7 +3783,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", "\"arrayString\"", 
ColumnType.STRING_ARRAY),
                       null
                   ))
@@ -3971,7 +3831,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", 
"\"arrayStringNulls\"", ColumnType.STRING_ARRAY),
                       null
                   ))
@@ -4018,7 +3878,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", "\"arrayLong\"", 
ColumnType.LONG_ARRAY),
                       null
                   ))
@@ -4072,7 +3932,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", 
"\"arrayLongNulls\"", ColumnType.LONG_ARRAY),
                       null
                   ))
@@ -4122,7 +3982,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", "\"arrayDouble\"", 
ColumnType.DOUBLE_ARRAY),
                       null
                   ))
@@ -4176,7 +4036,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", 
"\"arrayDoubleNulls\"", ColumnType.DOUBLE_ARRAY),
                       null
                   ))
@@ -4312,7 +4172,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
                   .dataSource(
                       UnnestDataSource.create(
                           UnnestDataSource.create(
-                              new TableDataSource(DATA_SOURCE_ARRAYS),
+                              new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                               expressionVirtualColumn(
                                   "j0.unnest",
                                   "\"arrayStringNulls\"",
@@ -4629,7 +4489,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
                       UnnestDataSource.create(
                           FilteredDataSource.create(
                               UnnestDataSource.create(
-                                  new TableDataSource(DATA_SOURCE_ARRAYS),
+                                  new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                                   expressionVirtualColumn(
                                       "j0.unnest",
                                       "\"arrayLongNulls\"",
@@ -4779,7 +4639,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
                         UnnestDataSource.create(
                             FilteredDataSource.create(
                                 UnnestDataSource.create(
-                                    new TableDataSource(DATA_SOURCE_ARRAYS),
+                                    new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                                     expressionVirtualColumn(
                                         "j0.unnest",
                                         "\"arrayLongNulls\"",
@@ -4890,7 +4750,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             GroupByQuery.builder()
                         .setDataSource(UnnestDataSource.create(
-                            new TableDataSource(DATA_SOURCE_ARRAYS),
+                            new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                             expressionVirtualColumn("j0.unnest", 
"\"arrayStringNulls\"", ColumnType.STRING_ARRAY),
                             null
                         ))
@@ -6494,7 +6354,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             Druids.newTimeseriesQueryBuilder()
                   .dataSource(UnnestDataSource.create(
-                      new TableDataSource(DATA_SOURCE_ARRAYS),
+                      new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                       expressionVirtualColumn("j0.unnest", 
"\"arrayDoubleNulls\"", ColumnType.DOUBLE_ARRAY),
                       null
                   ))
@@ -6581,7 +6441,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             GroupByQuery.builder()
                         .setDataSource(UnnestDataSource.create(
-                            new TableDataSource(DATA_SOURCE_ARRAYS),
+                            new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                             expressionVirtualColumn("j0.unnest", 
"\"arrayLongNulls\"", ColumnType.LONG_ARRAY),
                             NullHandling.sqlCompatible()
                             ? or(
@@ -6618,7 +6478,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
         ImmutableList.of(
             GroupByQuery.builder()
                         .setDataSource(UnnestDataSource.create(
-                            new TableDataSource(DATA_SOURCE_ARRAYS),
+                            new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                             expressionVirtualColumn("j0.unnest", 
"\"arrayLongNulls\"", ColumnType.LONG_ARRAY),
                             NullHandling.sqlCompatible()
                             ? or(
@@ -6736,7 +6596,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
             Druids.newScanQueryBuilder()
                   .dataSource(UnnestDataSource.create(
                       FilteredDataSource.create(
-                          new TableDataSource(DATA_SOURCE_ARRAYS),
+                          new TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                           range("__time", ColumnType.LONG, 1672617600000L, 
1672704600000L, false, false)
                       ),
                       expressionVirtualColumn("j0.unnest", 
"\"arrayStringNulls\"", ColumnType.STRING_ARRAY),
@@ -6995,7 +6855,7 @@ public class CalciteArraysQueryTest extends 
BaseCalciteQueryTest
                               .dataSource(
                                   UnnestDataSource.create(
                                       FilteredDataSource.create(
-                                          new 
TableDataSource(DATA_SOURCE_ARRAYS),
+                                          new 
TableDataSource(CalciteTests.ARRAYS_DATASOURCE),
                                           range("__time", ColumnType.LONG, 
1672617600000L, 1672704600000L, false, false)
                                       ),
                                       expressionVirtualColumn("j0.unnest", 
"\"arrayLongNulls\"", ColumnType.LONG_ARRAY),
diff --git 
a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java 
b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
index 8df5d32ddcd..5ec49cf433f 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
@@ -173,6 +173,7 @@ public class CalciteQueryTest extends BaseCalciteQueryTest
         + "WHERE TABLE_TYPE IN ('SYSTEM_TABLE', 'TABLE', 'VIEW')",
         ImmutableList.of(),
         ImmutableList.<Object[]>builder()
+                     .add(new Object[]{"druid", 
CalciteTests.ARRAYS_DATASOURCE, "TABLE", "NO", "NO"})
                      .add(new Object[]{"druid", 
CalciteTests.BROADCAST_DATASOURCE, "TABLE", "YES", "YES"})
                      .add(new Object[]{"druid", CalciteTests.DATASOURCE1, 
"TABLE", "NO", "NO"})
                      .add(new Object[]{"druid", CalciteTests.DATASOURCE2, 
"TABLE", "NO", "NO"})
@@ -213,6 +214,7 @@ public class CalciteQueryTest extends BaseCalciteQueryTest
         CalciteTests.SUPER_USER_AUTH_RESULT,
         ImmutableList.of(),
         ImmutableList.<Object[]>builder()
+                     .add(new Object[]{"druid", 
CalciteTests.ARRAYS_DATASOURCE, "TABLE", "NO", "NO"})
                      .add(new Object[]{"druid", 
CalciteTests.BROADCAST_DATASOURCE, "TABLE", "YES", "YES"})
                      .add(new Object[]{"druid", CalciteTests.DATASOURCE1, 
"TABLE", "NO", "NO"})
                      .add(new Object[]{"druid", CalciteTests.DATASOURCE2, 
"TABLE", "NO", "NO"})
diff --git 
a/sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java 
b/sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
index 342a9384dec..489632070e9 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java
@@ -111,6 +111,7 @@ public class CalciteTests
   public static final String DATASOURCE3 = "numfoo";
   public static final String DATASOURCE4 = "foo4";
   public static final String DATASOURCE5 = "lotsocolumns";
+  public static final String ARRAYS_DATASOURCE = "arrays";
   public static final String BROADCAST_DATASOURCE = "broadcast";
   public static final String FORBIDDEN_DATASOURCE = "forbiddenDatasource";
   public static final String FORBIDDEN_DESTINATION = "forbiddenDestination";
diff --git 
a/sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java 
b/sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java
index 53480e2532d..ec1086bc1d2 100644
--- a/sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java
+++ b/sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java
@@ -42,6 +42,7 @@ import org.apache.druid.java.util.common.parsers.JSONPathSpec;
 import org.apache.druid.query.DataSource;
 import org.apache.druid.query.GlobalTableDataSource;
 import org.apache.druid.query.InlineDataSource;
+import org.apache.druid.query.NestedDataTestUtils;
 import org.apache.druid.query.QueryRunnerFactoryConglomerate;
 import org.apache.druid.query.aggregation.CountAggregatorFactory;
 import org.apache.druid.query.aggregation.DoubleSumAggregatorFactory;
@@ -832,6 +833,30 @@ public class TestDataBuilder
         .rows(USER_VISIT_ROWS)
         .buildMMappedIndex();
 
+    final QueryableIndex arraysIndex = IndexBuilder
+        .create()
+        .tmpDir(new File(tmpDir, "9"))
+        
.segmentWriteOutMediumFactory(OffHeapMemorySegmentWriteOutMediumFactory.instance())
+        .schema(
+            new IncrementalIndexSchema.Builder()
+                
.withTimestampSpec(NestedDataTestUtils.AUTO_SCHEMA.getTimestampSpec())
+                
.withDimensionsSpec(NestedDataTestUtils.AUTO_SCHEMA.getDimensionsSpec())
+                .withMetrics(
+                    new CountAggregatorFactory("cnt")
+                )
+                .withRollup(false)
+                .build()
+        )
+        .inputSource(
+            ResourceInputSource.of(
+                NestedDataTestUtils.class.getClassLoader(),
+                NestedDataTestUtils.ARRAY_TYPES_DATA_FILE
+            )
+        )
+        .inputFormat(TestDataBuilder.DEFAULT_JSON_INPUT_FORMAT)
+        .inputTmpDir(new File(tmpDir, "9-input"))
+        .buildMMappedIndex();
+
     return new SpecificSegmentsQuerySegmentWalker(
         conglomerate,
         injector.getInstance(SegmentWrangler.class),
@@ -945,6 +970,15 @@ public class TestDataBuilder
                  .size(0)
                  .build(),
       makeWikipediaIndexWithAggregation(tmpDir)
+    ).add(
+        DataSegment.builder()
+                   .dataSource(CalciteTests.ARRAYS_DATASOURCE)
+                   .version("1")
+                   .interval(arraysIndex.getDataInterval())
+                   .shardSpec(new LinearShardSpec(1))
+                   .size(0)
+                   .build(),
+        arraysIndex
     );
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch 29.0.1 updated: [Backport] MSQ: Validate that strings and string arrays are not mixed. (#15920) (#16160)

Reply via email to