gianm commented on PR #18144:
URL: https://github.com/apache/druid/pull/18144#issuecomment-2978050222

   @kgyrtkirk thank you for taking a look.
   
   > Due to the fact that classes were renamed/moved its harder to focus on the 
core aspects of the changes - since those changes have touched a lot of files 
I'm a bit worried if the PR enters a conflicted change after some other changes 
will be merged.
   > 
   > I think it doesn't change fundamentally a lot of things right now - is 
that right? but is it on the roadmap to take away the `shuffleSpec` from the 
`StageDefintion` ?
   
   There are a lot of renames, but most of them are just renaming things from 
`XFrameProcessorFactory` to `XStageProcessor`.
   
   Yes, the idea in this patch was to not change anything fundamentally, 
especially with the existing stage implementations. The main change is that in 
the future, stages can control their own shuffling logic, rather than being 
"forced" into having the standard approach applied.
   
   I don't plan to remove `shuffleSpec` from `StageDefinition`. It's still 
needed for a couple of reasons:
   
   - the MSQ framework still does determine how many workers should be 
launched, when they should be launched, and which workers should handle which 
partitions. It needs the `shuffleSpec` to help figure this out.
   - for the `GLOBAL_SORT` shuffle, the MSQ framework still determines the 
partition boundaries based on statistics sent from each worker. It needs the 
`shuffleSpec` to know it should activate this mode.
   
   For this reason, I have this text in the javadoc for `StageProcessor`:
   
   > For stages that shuffle, i.e. where `StageDefinition#doesShuffle()`, the 
outputs must be partitioned according to the `StageDefinition#getShuffleSpec()`.
   
   The idea is that the `shuffleSpec` still exists, and the stage processor 
must respect it. The `StandardStageRunner` and `StandardShuffleOperations` are 
provided for stages that want to do things the "standard" way. In the future I 
imagine some stages will have an optimized path for certain cases of 
`shuffleSpec`, and then fallback to the `StandardShuffleOperations` for other 
cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to