Akshat-Jain opened a new pull request, #16781: URL: https://github.com/apache/druid/pull/16781
### Description Currently, we are passing all operator factories (NaiveSort, NaivePartition, Window) in the list of operator factories for window stage definition. Ideally, we shouldn’t have to pass anything except the window factories, since sorting and partitioning are expected to be handled by the shuffle spec of the previous stage. This PR makes the change to pass only the window operator factories for window stage definition. Making this change unraveled a bug where the logic for finding shuffle spec for next window stage was incorrect. **Description of the above-mentioned bug:** For finding shuffle spec, we shouldn't be filtering out based on the partition columns. For queries having a window function like: ```sql row_number() over (partition by array[1,2,length(cityName)] order by countryName) as c ``` We get the SortOperator with both columns (`array[1,2,length(cityName)]`, `countryName`), and we get the partition operator with only `array[1,2,length(cityName)]`. With the current logic, we would've gotten rid of the `countryName` from the shuffle spec, as it wasn't in the partition factory --- which is incorrect. The reason this wasn't broken in the current code (and gets unraveled with this PR's code changes) is that we were explicitly passing the original sort and partitioning operators in the window stage definition, hence overriding the behavior of the above incorrectly evaluated shuffle spec. <hr> This PR has: - [x] been self-reviewed. - [x] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
