[GitHub] [incubator-druid] jihoonson commented on issue #8061: Native parallel batch indexing with shuffle

GitBox Fri, 12 Jul 2019 13:53:24 -0700

jihoonson commented on issue #8061: Native parallel batch indexing with shuffle
URL: 
https://github.com/apache/incubator-druid/issues/8061#issuecomment-511029723
 
 
   > that is fine and I am guessing it will batch task status call i.e. ask 
status of multiple supervisor tasks in one overlord api request.
   
   Correct.
   
   > will the worker tasks complete and exit eventually or they will be left 
running till manual intervention ?
   
   All subtasks report the pushed segments to the supervisor task at the end of 
indexing. So, if the supervisor task is not running, then they would end up 
being failed at this stage. However, this is still quite annoying since they 
will occupy middleManager resources unnecessarily. I guess we could add some 
health check to subtasks for the supervisor task.
   
   > we might consider making supervisor real "supervisor" like Kafka instead 
of "task" so that they get special powers to manage things better ? but I think 
I am remembering some comment that they are made tasks because tasks are better 
equipped to work with locking framework available. one option could be to let 
spawned worker task do the locking , since all worker tasks would be in same 
group so multiple worker tasks trying to obtain same lock would still work..... 
this is unverified wishful thinking :)
   
   Yeah, my original intention was to add a "supervisor" like Kafka rather than 
a new task type, but I changed my mind to use the existing task lock type 
system. And now I'm inclined to keep the current design of supervisor task 
because
   
   - "supervisor" is designed for more like stream ingestion for each 
dataSource. It runs forever once it's submitted and the history of its spec 
changes is recorded in metadata store. 
   - "supervisor" design is less scalable since the overlord handles all 
supervisors and requests/responses for their tasks. One of our customers 
already had some issue with this. They were running more than 1000 tasks for 
each Kafka supervisor. Kafka ingestion got stuck because their overlord 
couldn't handle too many HTTP requests from tasks in time.
   
   For the second reason, I would upvote to even demote the Kafka/Kinesis 
supervisor to the supervisor task.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-druid] jihoonson commented on issue #8061: Native parallel batch indexing with shuffle

Reply via email to