Re: [PR] Grouping Engine fix when a limit spec with different order by columns is applied (druid)

via GitHub Wed, 12 Jun 2024 03:35:42 -0700


kgyrtkirk commented on code in PR #16534:
URL: https://github.com/apache/druid/pull/16534#discussion_r1636216662



##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQuery.java:
##########
@@ -565,7 +565,7 @@ private boolean canDoLimitPushDown(
    * limit/order spec (unlike non-push down case where the results always use 
the default natural ascending order),
    * so when merging these partial result streams, the merge needs to use the 
same ordering to get correct results.
    */
-  private Ordering<ResultRow> getRowOrderingForPushDown(
+  private OrderingAndDimensions getRowOrderingAndDimensionsForPushDown(

Review Comment:
   note: I think you could keep the old method with its signature; and service 
it like: `return getOrderingAndDimensions(false).getRowOrdering()`



##########
processing/src/main/java/org/apache/druid/query/groupby/GroupByQuery.java:
##########
@@ -577,6 +577,7 @@ private Ordering<ResultRow> getRowOrderingForPushDown(
     final List<Boolean> needsReverseList = new ArrayList<>();
     final List<ColumnType> dimensionTypes = new ArrayList<>();
     final List<StringComparator> comparators = new ArrayList<>();
+    final List<DimensionSpec> dimensionsInOrder = new ArrayList<>();

Review Comment:
   this double-book-keeping is a pretty wierd way to supply this detail - but 
it works...
   
   I think it would be more beneficial
   * identify the differences between `getRowOrderingForPushDown` and 
`getRowOrdering` 
   * merge the differing behaviour into the `getRowOrderingForPushDown` method
      * isn't the main difference is that `getRowOrdering` ignores the 
`limitSpec` regardless its set?
   * possibly also eat-up the `RowBasedGrouperHelper` which is another copy of 
the process of creating the comparators - I think correctness of the execution 
depends on knowing on which columns we ordered and in what order....having just 
one source of truth could reduce the amount of issues we face
    
   all of the above are refactors - which might delay the fix of this bug....



##########
processing/src/main/java/org/apache/druid/query/groupby/GroupingEngine.java:
##########
@@ -686,8 +686,11 @@ public Sequence<ResultRow> processSubtotalsSpec(
           processingConfig.intermediateComputeSizeBytes()
       );
 
-      List<String> queryDimNames = 
baseSubtotalQuery.getDimensions().stream().map(DimensionSpec::getOutputName)
-                                                    
.collect(Collectors.toList());
+      List<String> queryDimNamesInOrder = 
baseSubtotalQuery.getOrderingAndDimensions(false)
+                                                           .getDimensions()
+                                                           .stream()
+                                                           
.map(DimensionSpec::getOutputName)
+                                                           
.collect(Collectors.toList());

Review Comment:
   note: can this be hidden somewhere - like a method named: 
`getDimentsionColumnNames` or something



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Grouping Engine fix when a limit spec with different order by columns is applied (druid)

Reply via email to