[
https://issues.apache.org/jira/browse/TEZ-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528394#comment-14528394
]
Rajesh Balamohan edited comment on TEZ-2414 at 5/5/15 1:08 PM:
---------------------------------------------------------------
Attaching patch for TEZ-2003 branch. Related patch for master is TEZ-1752.
Attaching the review comments.
>>>>
Spoke to Rajesh offline about committing the patch into TEZ-2003 instead of
master for now, and hardening it there.
Looking at the patch, especially TaskRunner and LogicalIORuntimeTask changes, I
think it can be split into two - changes to LogicalIOProcessorRuntimeTask,
RuntimeTask, TaskReporter, TezTaskRunner go into the branch, and the rest into
the master. Rajesh, thoughts on splitting it ?
Comments (TaskReporter, LogicalIORuntimeTask etc changes) - mostly minor
processorClosed=true, initializedInputs.remove, initializedOutputs.remove() -
should be fore the actual invocation of close. Otherwise if there's an error we
may try to invoke close twice.
LOG.info("Cleanup is complete. EventRouterThread is yet to be interrupted");
Nit: This may be misleading since the eventRouter may have been interrupted
during the close() finally block.
+ } catch (Throwable t) {
+ LOG.warn("Error in final cleanup of task. ", t);
This isn't necessary. Exceptions are already being handled for individual close
calls.
pubilc enum State - Can be an isRunning method like the existing hasInitialized
method, instead of making the enum public
In TaskRunnerCallable, is there a need to check for InterruptedExceptions in
the catch Throwable clause ?
Nit: task will never be null in TezTaskRunner
An exception from abortTask should probably be reported as a Failure. Can be
fixed later in the branch.
Before taskFuture = executor.submit(callable); - checking the interrupt status
may be useful. Otherwise the task would start, and the get is immediately
interrupted.
maybeInterruptWaitingThread - Don't think we should be interrupting the main
thread at this point. That can have several consequences. If task.run()
returned without an Exception, but due to an abort/interrupt invocation, and
the close succeeds after this - we'll try reporting the task as successful. If
there's an error from close, there's a possibility that the main thread would
have deregistered the task from the Reporter, and the TaskRunnerCallable thread
would fail while indicating an error. Throwing an InterruptedException here and
gracefully falling off the TaskRunnerCallable thread should take care of this.
After this, the main run() thread regains control and can shutdown.
For a future jira. Inputs / Outputs could check for interrupts during shutdown
to prevent costly spill attempts. The rest of the patch probably addresses this
though.
>>>>
>>> processorClosed=true, initializedInputs.remove, initializedOutputs.remove()
>>> - should be fore the actual invocation of close
- Fixed. Also removed log statements & unwanted try block.
>> pubilc enum State - Can be an isRunning
- Fixed. Added RuntimeTask.isRunning()
>> In TaskRunnerCallable, is there a need to check for InterruptedExceptions
- Yes, needed. If processor run is not consuming the interrupt, it might be
throwing interruptedException & in such cases, we need to reset the interrupt
status.
>> An exception from abortTask should probably be reported as a Failure
- Fixed.
>> Before taskFuture = executor.submit(callable); - checking the interrupt
>> status may be useful
- Fixed.
was (Author: rajesh.balamohan):
Attaching patch for TEZ-2003 branch.
> LogicalIOProcessorRuntimeTask, RuntimeTask, TezTaskRunner should handle
> interrupts & carry out necessary cleanups
> -----------------------------------------------------------------------------------------------------------------
>
> Key: TEZ-2414
> URL: https://issues.apache.org/jira/browse/TEZ-2414
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2414.1.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)