himanshug commented on issue #8061: Native parallel batch indexing with shuffle
URL: 
https://github.com/apache/incubator-druid/issues/8061#issuecomment-510644000
 
 
   SGTM in general 
   
   > When the supervisor task is finished (either succeeded or failed), the 
overlord sends cleanup requests with supervisorTaskId to all middleManagers 
(and indexers)
   
   does overlord treat "supervisor" task as special task to be able to initiate 
cleanup requests? what if the MM is down temporarily or is cleanup fails for 
some reason ? In addition to overlord cleanup requests, It might be good for 
middleManagers to periodically check whether "supervisor" task is running or 
not and do the self cleanup.
   also maybe have some MM level configuration around maximum disk space that 
can be utilized for intermediary data.
   
   > The supervisor task provides an additional configuration in its 
tuningConfig, i.e., numSecondPhaseTasks or inputRowsPerSecondPhaseTask, to 
support control of parallelism of the phase 2. This will be improved to 
automatically determine the optimal parallelism in the future.
   
   I think a user defined upper limit could always exist in all "supervisor" 
tasks that spawn extra tasks so that user can plan worker capacity knowing how 
many tasks at a maximum would be running via parallel [shuffle] task. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to