gianm commented on code in PR #15920:
URL: https://github.com/apache/druid/pull/15920#discussion_r1509444779
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java:
##########
@@ -307,6 +337,124 @@ private static void validateLimitAndOffset(final RelNode
topRel, final boolean l
}
}
+ /**
+ * Validate that the query does not include any type changes from string to
array or vice versa.
+ *
+ * These type changes tend to cause problems due to mixing of multi-value
strings and string arrays. In particular,
+ * many queries written in the "classic MVD" style (treating MVDs as if they
were regular strings) will fail when
+ * MVDs and arrays are mixed. So, we detect them as invalid.
+ *
+ * @param rootRel root rel
+ * @param fieldMappings field mappings from {@link #validateInsert(RelNode,
List, Table, PlannerContext)}
+ * @param targetTable table we are inserting (or replacing) into, if any
+ * @param plannerContext planner context
+ */
+ private static void validateTypeChanges(
+ final RelNode rootRel,
+ final List<Pair<Integer, String>> fieldMappings,
+ @Nullable final Table targetTable,
+ final PlannerContext plannerContext
+ )
+ {
+ if (targetTable == null) {
+ return;
+ }
+
+ final Set<String> columnsExcludedFromTypeVerification =
+
MultiStageQueryContext.getColumnsExcludedFromTypeVerification(plannerContext.queryContext());
+ final ArrayIngestMode arrayIngestMode =
MultiStageQueryContext.getArrayIngestMode(plannerContext.queryContext());
+
+ for (Pair<Integer, String> fieldMapping : fieldMappings) {
+ final int columnIndex = fieldMapping.left;
+ final String columnName = fieldMapping.right;
+ final RelDataTypeField oldSqlTypeField =
+
targetTable.getRowType(DruidTypeSystem.TYPE_FACTORY).getField(columnName, true,
false);
+
+ if (!columnsExcludedFromTypeVerification.contains(columnName) &&
oldSqlTypeField != null) {
+ final ColumnType oldDruidType =
Calcites.getColumnTypeForRelDataType(oldSqlTypeField.getType());
+ final RelDataType newSqlType =
rootRel.getRowType().getFieldList().get(columnIndex).getType();
+ final ColumnType newDruidType =
+
DimensionSchemaUtils.getDimensionType(Calcites.getColumnTypeForRelDataType(newSqlType),
arrayIngestMode);
+
+ if (newDruidType.isArray() && oldDruidType.is(ValueType.STRING)
Review Comment:
This validation is really just to prevent accidents that break queries, due
a pretty unique situation:
- due to the complexity around `arrayIngestMode` it is easy to accidentally
ingest a list of strings as the "wrong" type accidentally (MVD when you meant
ARRAY or vice-versa).
- when a single column has both MVD and ARRAY in it, existing queries can
break for the reasons I mentioned in the PR description.
This situation is problematic but also only really exists for the MVD/array
combinaton. So I don't intent this to be the start of a DDL layer, just a point
fix to this particular problem. I do think we want a catalog layer that can be
part of more extensive type validations, but that should be opt-in &
user-driven & more flexible than some hard-coded checks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]