jihoonson commented on issue #8061: Native parallel batch indexing with shuffle URL: https://github.com/apache/incubator-druid/issues/8061#issuecomment-516641048 @nishantmonu51 good point! We currently have two provisioning strategies for auto scaling, i.e., `simple` and `pendingTaskBased`, and both of them look to terminate middleManagers if it has been a long time since they completed the last task. I think there could be two options available to avoid this issue. - Improve the provisioning strategy to consider intermediary data in middleManagers. If they are still serving intermediary data for parallel batch tasks, then the auto scaler shouldn't terminate them. - As you mentioned, store intermediary data on the deep storage. I'm inclined to the first approach because 1) it's more efficient to read data from middleManagers than from deep storage and 2) intermediary data cleanup for deep storage could be more complex than that for middleManagers (it's still doable though). What do you think? BTW, no matter what way we go, I think this issue could be fixed in a follow-up PR. Does this make sense?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
