Zameer Manji created AURORA-1582:
------------------------------------

             Summary: Task History Pruning attempts can fail silently
                 Key: AURORA-1582
                 URL: https://issues.apache.org/jira/browse/AURORA-1582
             Project: Aurora
          Issue Type: Bug
            Reporter: Zameer Manji


As discovered in AURORA-1580, task history pruning attempts can fail and if 
they do fail, they fail silently. The root cause seems to be that AsyncModule's 
{{AsyncProcessor}} threads just log the unhandled exception if it exists:
{noformat}
  private static void evaluateResult(Runnable runnable, Throwable throwable, 
Logger logger) {
    // See java.util.concurrent.ThreadPoolExecutor#afterExecute(Runnable, 
Throwable)
    // for more details and an implementation example.
    if (throwable == null) {
      if (runnable instanceof Future) {
        try {
          Future<?> future = (Future<?>) runnable;
          if (future.isDone()) {
            future.get();
          }
        } catch (InterruptedException ie) {
          Thread.currentThread().interrupt();
        } catch (ExecutionException ee) {
          logger.error(ee.toString(), ee);
        }
      }
    } else {
      logger.error(throwable.toString(), throwable);
    }
  }
{noformat}

I think instead of silently failing if work on these threads fail, we should 
shut down the scheduler, much like how if the preemptor or other guava service 
fails we shut down the scheduler. This way the scheduler does not enter an 
undefined state and operators are informed of the abnormal behaviour.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to