jihoonson commented on issue #8061: Native parallel batch indexing with shuffle
URL: 
https://github.com/apache/incubator-druid/issues/8061#issuecomment-516641048
 
 
   @nishantmonu51 good point! We currently have two provisioning strategies for 
auto scaling, i.e., `simple` and `pendingTaskBased`, and both of them look to 
terminate middleManagers if it has been a long time since they completed the 
last task. I think there could be two options available to avoid this issue.
   
   - Improve the provisioning strategy to consider intermediary data in 
middleManagers. If they are still serving intermediary data for parallel batch 
tasks, then the auto scaler shouldn't terminate them.
   - As you mentioned, store intermediary data on the deep storage.
   
   I'm inclined to the first approach because 1) it's more efficient to read 
data from middleManagers than from deep storage and 2) intermediary data 
cleanup for deep storage could be more complex than that for middleManagers 
(it's still doable though). What do you think?
   
   BTW, no matter what way we go, I think this issue could be fixed in a 
follow-up PR. Does this make sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to