gianm commented on a change in pull request #12253:
URL: https://github.com/apache/druid/pull/12253#discussion_r804703490
##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -375,3 +375,8 @@ This query returns the following result:
Note that, for groupBy queries, you could get similar result with a [having
spec](having.md) but using a filtered
`dimensionSpec` is much more efficient because that gets applied at the lowest
level in the query processing pipeline.
Having specs are applied at the outermost level of groupBy query processing.
+
+## Disable GroupBy on multivalue columns
+
+As grouping on multivalue columns causes implicit unnest, users can avoid this
behaviour by setting
+`groupByEnableMultiValueUnnesting` in the query context to `false`. This will
result the query to error out.
Review comment:
I think we should leave this out of the docs until we the array-based
story is complete. Then we can document this, array types, array functions, etc.
##########
File path:
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -327,35 +345,46 @@ public static int getCardinalityForArrayAggregation(
}
/**
- * Checks whether all "dimensions" are either single-valued, or if the input
column or output dimension spec has
- * specified a type that {@link ColumnType#isArray()}. Both cases indicate
we don't want to explode the under-lying
- * multi value column. Since selectors on non-existent columns will show up
as full of nulls, they are effectively
- * single valued, however capabilites on columns can also be null, for
example during broker merge with an 'inline'
- * datasource subquery, so we only return true from this method when
capabilities are fully known.
+ * Returns all dimension names that are or could be multi valued, or if the
input column specified a type that
+ * {@link ColumnType#isArray()}. Both cases indicate we want to explode the
under-lying multi value column. Since
+ * selectors on non-existent columns will show up as full of nulls, they are
effectively single valued, however
+ * capabilites on columns can also be null, for example during broker merge
with an 'inline' datasource subquery. We
+ * mark columns with null capabilites as candidates for explosion.
*/
- public static boolean hasNoExplodingDimensions(
+ public static List<String> findAllProbableExplodingDimensions(
Review comment:
Hmm, looking through this I just realized that this function can't tell
if dimensions are definitely multi-valued or not. It can only tell if they
_might_ be multi-valued. (But they also might not be!)
So I think we should move the check; instead of checking here, we should
check row by row as we actually run the query. If a multi-value row is ever
encountered, then at that point we should throw the error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]