gianm commented on a change in pull request #12253:
URL: https://github.com/apache/druid/pull/12253#discussion_r804703490



##########
File path: docs/querying/multi-value-dimensions.md
##########
@@ -375,3 +375,8 @@ This query returns the following result:
 Note that, for groupBy queries, you could get similar result with a [having 
spec](having.md) but using a filtered
 `dimensionSpec` is much more efficient because that gets applied at the lowest 
level in the query processing pipeline.
 Having specs are applied at the outermost level of groupBy query processing.
+
+## Disable GroupBy on multivalue columns
+
+As grouping on multivalue columns causes implicit unnest, users can avoid this 
behaviour by setting
+`groupByEnableMultiValueUnnesting` in the query context to `false`. This will 
result the query to error out.

Review comment:
       I think we should leave this out of the docs until we the array-based 
story is complete. Then we can document this, array types, array functions, etc.

##########
File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByQueryEngineV2.java
##########
@@ -327,35 +345,46 @@ public static int getCardinalityForArrayAggregation(
   }
 
   /**
-   * Checks whether all "dimensions" are either single-valued, or if the input 
column or output dimension spec has
-   * specified a type that {@link ColumnType#isArray()}. Both cases indicate 
we don't want to explode the under-lying
-   * multi value column. Since selectors on non-existent columns will show up 
as full of nulls, they are effectively
-   * single valued, however capabilites on columns can also be null, for 
example during broker merge with an 'inline'
-   * datasource subquery, so we only return true from this method when 
capabilities are fully known.
+   * Returns all dimension names that are or could be multi valued, or if the 
input column specified a type that
+   * {@link ColumnType#isArray()}. Both cases indicate we want to explode the 
under-lying multi value column. Since
+   * selectors on non-existent columns will show up as full of nulls, they are 
effectively single valued, however
+   * capabilites on columns can also be null, for example during broker merge 
with an 'inline' datasource subquery. We
+   * mark columns with null capabilites as candidates for explosion.
    */
-  public static boolean hasNoExplodingDimensions(
+  public static List<String> findAllProbableExplodingDimensions(

Review comment:
       Hmm, looking through this I just realized that this function can't tell 
if dimensions are definitely multi-valued or not. It can only tell if they 
_might_ be multi-valued. (But they also might not be!)
   
   So I think we should move the check; instead of checking here, we should 
check row by row as we actually run the query. If a multi-value row is ever 
encountered, then at that point we should throw the error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to