jihoonson edited a comment on issue #8061: Native parallel batch indexing with 
shuffle
URL: 
https://github.com/apache/incubator-druid/issues/8061#issuecomment-510670825
 
 
   @himanshug thanks for taking a look!
   
   > does overlord treat "supervisor" task as special task to be able to 
initiate cleanup requests? what if the MM is down temporarily or if cleanup 
fails for some reason ? In addition to overlord cleanup requests, It might be 
good for middleManagers to periodically check whether "supervisor" task is 
running or not and do the self cleanup.
   
   Ah this is a good point. To handle middleManager failure, a sort of 
self-cleanup can be triggered when some amount of time is elapsed since the 
last access to any partition for a supervisorTask. Does this sound good?
   
   > also maybe have some MM level configuration around maximum disk space that 
can be utilized for intermediary data.
   
   Thanks for reminding me of this. Forgot to add it to the proposal. I'm 
thinking to use the existing `StorageLocationConfig` for this. To fully utilize 
the disk bandwidth, the partitions of the same supervisorTaskId will be 
assigned in a round-robin fashion. Will update the proposal shortly.
   
   > I think a user defined upper limit could always exist in all "supervisor" 
tasks that spawn extra tasks so that user can plan worker capacity knowing how 
many tasks at a maximum would be running via parallel [shuffle] task.
   
   This is already supported with `maxNumSubTasks` 
(https://druid.apache.org/docs/latest/ingestion/native_tasks.html#tuningconfig).
 `maxNumSubTasks` is to limit the total number of subtasks at any time while a 
parallel index task is running. `numSecondPhaseTasks` is somewhat different. 
It's the total number of phase 2 tasks and the supervisor task will regard the 
phase 2 is succeeded once `numSecondPhaseTasks` phase 2 tasks are succeeded.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to