Akshat-Jain commented on code in PR #16804:
URL: https://github.com/apache/druid/pull/16804#discussion_r1697097941


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java:
##########
@@ -91,7 +91,6 @@ public QueryDefinition makeQueryDefinition(
     ShuffleSpec nextShuffleSpec = 
findShuffleSpecForNextWindow(operatorList.get(0), maxWorkerCount);

Review Comment:
   `findShuffleSpecForNextWindow()` returns null when we have empty over clause.
   
   In my original commit on this PR, I changed the logic of 
`findShuffleSpecForNextWindow()` to be:
   ```java
   if (partition == null) {
         return null;
       }
   if (partition.getPartitionColumns().isEmpty()) {
         return MixShuffleSpec.instance();
       }
   ```
   
   But Adarsh pointed out that this would propagate MixShuffleSpec to the other 
query kits which is inefficient.
   
   For example, it would make `shuffleSpec=MixShuffleSpec` for queries where we 
have multiple Scan stages before the Window stage. Sample query: 
   ```sql
   WITH "ext" AS (
     select * from wikipedia left join drill_wikipedia on 
wikipedia."countryName" = drill_wikipedia."countryName" where 
wikipedia."countryName" in ('Guatemala') limit 5
   )
   SELECT
     "__time",
     "countryName",
     "channel",
     row_number() over (order by countryName, cityName, channel) as c1,
     count(channel) over (partition by cityName order by countryName, cityName, 
channel) as c2
   FROM "ext"
   group by __time, countryName, cityName, channel 
   order by countryName, cityName, channel
   ```
   
   The above query gives:
   
![image](https://github.com/user-attachments/assets/fcf2211b-2b2c-4d57-8526-78a5e75a7f53)
   
   
   Hence, it's better to not propagate MixShuffleSpec to the other query kits, 
and just override it here only for the just-previous stage.
   
   Hope that makes sense!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to