[
https://issues.apache.org/jira/browse/AURORA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111758#comment-15111758
]
Zameer Manji edited comment on AURORA-1582 at 1/22/16 1:39 AM:
---------------------------------------------------------------
{noformat}
commit c89fecbcd93aa6ddcf6af60f2a2cb6315b7a4d19
Author: Zameer Manji <[email protected]>
Date: Thu Jan 21 17:38:25 2016 -0800
Turn TaskHistoryPruner into a service and trigger shutdown on pruning
failure.
Task pruning is key to operating a large cluster and failure to prune should
trigger shutdown to prevent unbounded growth of storage. This patch turns
`TaskHistoryPruner` into a service which propagates failure from failed
pruning
attempts towards the `ServiceManager`. Also completing a TODO which removes
a
test for behaviour that is very awkward to test for.
Bugs closed: AURORA-1582
Reviewed at https://reviews.apache.org/r/42332/
3 files changed, 68 insertions(+), 111 deletions(-)
{noformat}
was (Author: zmanji):
````
commit c89fecbcd93aa6ddcf6af60f2a2cb6315b7a4d19
Author: Zameer Manji <[email protected]>
Date: Thu Jan 21 17:38:25 2016 -0800
Turn TaskHistoryPruner into a service and trigger shutdown on pruning
failure.
Task pruning is key to operating a large cluster and failure to prune should
trigger shutdown to prevent unbounded growth of storage. This patch turns
`TaskHistoryPruner` into a service which propagates failure from failed
pruning
attempts towards the `ServiceManager`. Also completing a TODO which removes
a
test for behaviour that is very awkward to test for.
Bugs closed: AURORA-1582
Reviewed at https://reviews.apache.org/r/42332/
3 files changed, 68 insertions(+), 111 deletions(-)
````
> Task History Pruning attempts can fail silently
> -----------------------------------------------
>
> Key: AURORA-1582
> URL: https://issues.apache.org/jira/browse/AURORA-1582
> Project: Aurora
> Issue Type: Bug
> Reporter: Zameer Manji
> Assignee: Zameer Manji
>
> As discovered in AURORA-1580, task history pruning attempts can fail and if
> they do fail, they fail silently. The root cause seems to be that
> AsyncModule's {{AsyncProcessor}} threads just log the unhandled exception if
> it exists:
> {noformat}
> private static void evaluateResult(Runnable runnable, Throwable throwable,
> Logger logger) {
> // See java.util.concurrent.ThreadPoolExecutor#afterExecute(Runnable,
> Throwable)
> // for more details and an implementation example.
> if (throwable == null) {
> if (runnable instanceof Future) {
> try {
> Future<?> future = (Future<?>) runnable;
> if (future.isDone()) {
> future.get();
> }
> } catch (InterruptedException ie) {
> Thread.currentThread().interrupt();
> } catch (ExecutionException ee) {
> logger.error(ee.toString(), ee);
> }
> }
> } else {
> logger.error(throwable.toString(), throwable);
> }
> }
> {noformat}
> I think instead of silently failing if work on these threads fail, we should
> shut down the scheduler, much like how if the preemptor or other guava
> service fails we shut down the scheduler. This way the scheduler does not
> enter an undefined state and operators are informed of the abnormal behaviour.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)