Akshat-Jain opened a new pull request, #16781:
URL: https://github.com/apache/druid/pull/16781

   ### Description
   
   Currently, we are passing all operator factories (NaiveSort, NaivePartition, 
Window) in the list of operator factories for window stage definition.
   
   Ideally, we shouldn’t have to pass anything except the window factories, 
since sorting and partitioning are expected to be handled by the shuffle spec 
of the previous stage.
   
   This PR makes the change to pass only the window operator factories for 
window stage definition. Making this change unraveled a bug where the logic for 
finding shuffle spec for next window stage was incorrect.
   
   **Description of the above-mentioned bug:** For finding shuffle spec, we 
shouldn't be filtering out based on the partition columns. For queries having a 
window function like:
   ```sql
   row_number() over (partition by array[1,2,length(cityName)] order by 
countryName) as c
   ```
   We get the SortOperator with both columns (`array[1,2,length(cityName)]`, 
`countryName`), and we get the partition operator with only 
`array[1,2,length(cityName)]`. With the current logic, we would've gotten rid 
of the `countryName` from the shuffle spec, as it wasn't in the partition 
factory --- which is incorrect.
   
   The reason this wasn't broken in the current code (and gets unraveled with 
this PR's code changes) is that we were explicitly passing the original sort 
and partitioning operators in the window stage definition, hence overriding the 
behavior of the above incorrectly evaluated shuffle spec.
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to