[GitHub] [druid] LakshSingla commented on a diff in pull request #13352: Add a limit to the number of columns in the CLUSTERED BY clause

GitBox Sun, 13 Nov 2022 20:27:34 -0800


LakshSingla commented on code in PR #13352:
URL: https://github.com/apache/druid/pull/13352#discussion_r1021064001



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/QueryValidator.java:
##########
@@ -55,6 +57,15 @@ public static void validateQueryDef(final QueryDefinition 
queryDef)
         throw new ISE("Number of workers must be greater than 0");
       }
     }
+
+    // Check if the number of columns in the query's CLUSTERED BY clause donot 
exceed the limit
+    ClusterBy queryClusteredBy = 
queryDef.getFinalStageDefinition().getClusterBy();

Review Comment:
   The cluster by columns in the earlier stages might not have a 1:1 
correspondence with the query that the user has written therefore raising a 
cluster by error, in that case, shouldn't be actionable for the user IMO. Hence 
I only added the limit in the final stage (the original query that the user has 
written). Along with the Sequential merge mode on, I think that there should be 
enough guard rails in place to prevent an OOM.
   
   However we can add a limit on the cluster by in the other stages if we 
rephrase the error message as something like "Enough grouping keys present in 
stage [xx], the query might OOM". Those cluster by keys can correspond to 
something present in the group by clause for example. WDYT? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] LakshSingla commented on a diff in pull request #13352: Add a limit to the number of columns in the CLUSTERED BY clause

Reply via email to