gianm commented on PR #18144: URL: https://github.com/apache/druid/pull/18144#issuecomment-2978050222
@kgyrtkirk thank you for taking a look. > Due to the fact that classes were renamed/moved its harder to focus on the core aspects of the changes - since those changes have touched a lot of files I'm a bit worried if the PR enters a conflicted change after some other changes will be merged. > > I think it doesn't change fundamentally a lot of things right now - is that right? but is it on the roadmap to take away the `shuffleSpec` from the `StageDefintion` ? There are a lot of renames, but most of them are just renaming things from `XFrameProcessorFactory` to `XStageProcessor`. Yes, the idea in this patch was to not change anything fundamentally, especially with the existing stage implementations. The main change is that in the future, stages can control their own shuffling logic, rather than being "forced" into having the standard approach applied. I don't plan to remove `shuffleSpec` from `StageDefinition`. It's still needed for a couple of reasons: - the MSQ framework still does determine how many workers should be launched, when they should be launched, and which workers should handle which partitions. It needs the `shuffleSpec` to help figure this out. - for the `GLOBAL_SORT` shuffle, the MSQ framework still determines the partition boundaries based on statistics sent from each worker. It needs the `shuffleSpec` to know it should activate this mode. For this reason, I have this text in the javadoc for `StageProcessor`: > For stages that shuffle, i.e. where `StageDefinition#doesShuffle()`, the outputs must be partitioned according to the `StageDefinition#getShuffleSpec()`. The idea is that the `shuffleSpec` still exists, and the stage processor must respect it. The `StandardStageRunner` and `StandardShuffleOperations` are provided for stages that want to do things the "standard" way. In the future I imagine some stages will have an optimized path for certain cases of `shuffleSpec`, and then fallback to the `StandardShuffleOperations` for other cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
