jihoonson commented on issue #6803: [Proposal] Kill Hadoop MR task on kill of 
ingestion task and resume ability for Hadoop ingestion tasks
URL: 
https://github.com/apache/incubator-druid/issues/6803#issuecomment-451321623
 
 
   Thanks @ankit0811. Sounds useful!
   
   For phase 1, I think we need a unified way to support various platforms like 
Hadoop, Spark, and so on. So, I would suggest to change the way killing Druid 
tasks. Currently, the overlord sends a kill request to a middleManager where 
the task is running. Then, the middleManager just destroys the task process. As 
a result, the task can't have a chance to prepare stopping like cleaning up its 
resources or killing Hadoop jobs. Instead, I think the task can clean up 
resources before stop by changing how the middleManager kills the task. This 
way makes more sense to me because the Hadoop job is started and killed in the 
same place (Druid Hadoop task).
   
   Fortunately, we have some logics already implemented. First, there's 
`stopGracefully` in `Task`. 
   
   ```java
     /**
      * Asks a task to arrange for its "run" method to exit promptly. This 
method will only be called if
      * {@link #canRestore()} returns true. Tasks that take too long to stop 
gracefully will be terminated with
      * extreme prejudice.
      */
     void stopGracefully();
   ```
   
   This is currently only for restorable tasks, so you may want to make the 
hadoop task restorable (maybe related to phase 2?). `stopGracefully` method is 
currently called in `SingleTaskBackgroundRunner.stop()` which in turn is called 
when the output stream of the task process is closed (see 
`ForkingTaskRunner.stop()` and `ExecutorLifecycle.start()`). So, it would work 
if you make hadoop task restorable and change the way to kill to closing the 
output stream of the task process instead of destroying the task process 
directly. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to