[ 
https://issues.apache.org/jira/browse/TEZ-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528394#comment-14528394
 ] 

Rajesh Balamohan edited comment on TEZ-2414 at 5/5/15 1:08 PM:
---------------------------------------------------------------

Attaching patch for TEZ-2003 branch. Related patch for master is TEZ-1752. 

Attaching the review comments.

>>>>
Spoke to Rajesh offline about committing the patch into TEZ-2003 instead of 
master for now, and hardening it there.
Looking at the patch, especially TaskRunner and LogicalIORuntimeTask changes, I 
think it can be split into two - changes to LogicalIOProcessorRuntimeTask, 
RuntimeTask, TaskReporter, TezTaskRunner go into the branch, and the rest into 
the master. Rajesh, thoughts on splitting it ?
Comments (TaskReporter, LogicalIORuntimeTask etc changes) - mostly minor
processorClosed=true, initializedInputs.remove, initializedOutputs.remove() - 
should be fore the actual invocation of close. Otherwise if there's an error we 
may try to invoke close twice.
 LOG.info("Cleanup is complete. EventRouterThread is yet to be interrupted"); 
Nit: This may be misleading since the eventRouter may have been interrupted 
during the close() finally block.
+    } catch (Throwable t) {
+      LOG.warn("Error in final cleanup of task. ", t);
This isn't necessary. Exceptions are already being handled for individual close 
calls.
pubilc enum State - Can be an isRunning method like the existing hasInitialized 
method, instead of making the enum public
In TaskRunnerCallable, is there a need to check for InterruptedExceptions in 
the catch Throwable clause ?
Nit: task will never be null in TezTaskRunner
An exception from abortTask should probably be reported as a Failure. Can be 
fixed later in the branch.
Before taskFuture = executor.submit(callable); - checking the interrupt status 
may be useful. Otherwise the task would start, and the get is immediately 
interrupted.
maybeInterruptWaitingThread - Don't think we should be interrupting the main 
thread at this point. That can have several consequences. If task.run() 
returned without an Exception, but due to an abort/interrupt invocation, and 
the close succeeds after this - we'll try reporting the task as successful. If 
there's an error from close, there's a possibility that the main thread would 
have deregistered the task from the Reporter, and the TaskRunnerCallable thread 
would fail while indicating an error. Throwing an InterruptedException here and 
gracefully falling off the TaskRunnerCallable thread should take care of this. 
After this, the main run() thread regains control and can shutdown.
For a future jira. Inputs / Outputs could check for interrupts during shutdown 
to prevent costly spill attempts. The rest of the patch probably addresses this 
though.
>>>>


>>> processorClosed=true, initializedInputs.remove, initializedOutputs.remove() 
>>> - should be fore the actual invocation of close
- Fixed. Also removed log statements & unwanted try block.

>> pubilc enum State - Can be an isRunning
- Fixed. Added RuntimeTask.isRunning()

>> In TaskRunnerCallable, is there a need to check for InterruptedExceptions
- Yes, needed. If processor run is not consuming the interrupt, it might be 
throwing interruptedException & in such cases, we need to reset the interrupt 
status.

>> An exception from abortTask should probably be reported as a Failure
- Fixed.

>> Before taskFuture = executor.submit(callable); - checking the interrupt 
>> status may be useful
- Fixed.



was (Author: rajesh.balamohan):
Attaching patch for TEZ-2003 branch.

> LogicalIOProcessorRuntimeTask, RuntimeTask, TezTaskRunner should handle 
> interrupts & carry out necessary cleanups
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2414
>                 URL: https://issues.apache.org/jira/browse/TEZ-2414
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2414.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to