[
https://issues.apache.org/jira/browse/TEZ-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493493#comment-14493493
]
Siddharth Seth commented on TEZ-1752:
-------------------------------------
[~rajesh.balamohan] - thanks for taking this up. Haven't looked at the entire
patch yet (mostly skipped bits relating to cleanup).
- Re-setting the interrupt status in most places looks good. It probably needs
to be set in a few more places, and some parts could throw an
InterruptedException itself.
- In LogicalIOProcessorRuntimeTask - it would be useful to log the interrupt
status between each close invocation, and potentially set it if the I/O/P being
closed ends up unsetting it.
- cleanup would behave differently if initialize hasn't been invoked. We may
need to track which I O Ps have been initialized - and close just those - In
the MergeManager, InterruptedException thrown by MergeThraed.close likely needs
to be handled (otherwise it'll end up skipping cleanup?)
- In the invocation of finalMerge - an IOException is caught, are there
specific cases here where this IO exception is actually masking an interrupt ?
(and as a result the interrupt status needs to be set)
- The TezMerger change - should we just change the interface to throw
InterruptedException, instead of setting the flag. That's a private method, and
will force consumers within the IOs to handle it.
- UnorderedPartitionedKVWriter / others - in the close method, instead of
returning an empty event list - should this just throw an InterruptedException
back ?
- Is the change in the TaskReporter required ? taskFailed shouldn't be invoked
after the currentTask has been unregistered.
We likely need to ensure that cleanup / close methods aren't called twice -
once during regular cleanup, second during an interrupt while the cleanup is in
progress.
Not directly related to interrupts - but an invocation on Task.close() (regular
flow) can cause exceptions during Processor close or Input / Output close -
which would prevent subsequent Inputs / Outputs from being closed. Do we need
to make sure that close() gets invoked on subsequent Inputs / Outputs despite a
prior exception ?
Eventually, We should consider changing the API on Readers / Writers to throw
an InterruptedException rather than relying on IOException - so that Processors
can find out if an Input / Output gets interrupted - rather than ending up with
an error. That'll likely be 0.8 though.
Was talking to [~hitesh] offline about this for a bit - for this to work with
regular containers, a lookup on the Inputs / Outputs / Processors to find out
whether they are actually interruptable would be useful. Either via an
annotation or via an explicit API lookup (input.isInterrutable). I think this
can be a separate follow up jira though.
> Inputs / Outputs in the Runtime library should be interruptable
> ---------------------------------------------------------------
>
> Key: TEZ-1752
> URL: https://issues.apache.org/jira/browse/TEZ-1752
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Attachments: TEZ-1752.1.patch
>
>
> Not possible to preempt tasks without killing containers without this.
> There's still the problem of Processors not supporting interrupts. We may
> need API enhancements to either query IPOs on whether they are interrupbtible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)