jihoonson commented on issue #8061: Native parallel batch indexing with shuffle URL: https://github.com/apache/incubator-druid/issues/8061#issuecomment-511029723 > that is fine and I am guessing it will batch task status call i.e. ask status of multiple supervisor tasks in one overlord api request. Correct. > will the worker tasks complete and exit eventually or they will be left running till manual intervention ? All subtasks report the pushed segments to the supervisor task at the end of indexing. So, if the supervisor task is not running, then they would end up being failed at this stage. However, this is still quite annoying since they will occupy middleManager resources unnecessarily. I guess we could add some health check to subtasks for the supervisor task. > we might consider making supervisor real "supervisor" like Kafka instead of "task" so that they get special powers to manage things better ? but I think I am remembering some comment that they are made tasks because tasks are better equipped to work with locking framework available. one option could be to let spawned worker task do the locking , since all worker tasks would be in same group so multiple worker tasks trying to obtain same lock would still work..... this is unverified wishful thinking :) Yeah, my original intention was to add a "supervisor" like Kafka rather than a new task type, but I changed my mind to use the existing task lock type system. And now I'm inclined to keep the current design of supervisor task because - "supervisor" is designed for more like stream ingestion for each dataSource. It runs forever once it's submitted and the history of its spec changes is recorded in metadata store. - "supervisor" design is less scalable since the overlord handles all supervisors and requests/responses for their tasks. One of our customers already had some issue with this. They were running more than 1000 tasks for each Kafka supervisor. Kafka ingestion got stuck because their overlord couldn't handle too many HTTP requests from tasks in time. For the second reason, I would upvote to even demote the Kafka/Kinesis supervisor to the supervisor task.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
