johnyangk commented on a change in pull request #1: [NEMO-26] Implement SourceLocationAwareSchedulingPolicy URL: https://github.com/apache/incubator-nemo/pull/1#discussion_r173067005
########## File path: runtime/master/src/main/java/edu/snu/nemo/runtime/master/RuntimeMaster.java ########## @@ -274,16 +271,12 @@ private void handleControlMessage(final ControlMessage.Message message) { convertFailureCause(taskGroupStateChangedMsg.getFailureCause())); break; case ExecutorFailed: + // Executor failed due to user code. final ControlMessage.ExecutorFailedMsg executorFailedMsg = message.getExecutorFailedMsg(); final String failedExecutorId = executorFailedMsg.getExecutorId(); final Exception exception = SerializationUtils.deserialize(executorFailedMsg.getException().toByteArray()); LOG.error(failedExecutorId + " failed, Stack Trace: ", exception); - containerManager.onExecutorRemoved(failedExecutorId); throw new RuntimeException(exception); Review comment: @sanha @seojangho Good catch! I've done some research on this. I think this does shut down the Driver process. It appears that we've been relying on `REEFUncaughtExceptionHandler` globally set for all threads to shut down the JVM on unchecked exceptions like this `RuntimeException`. This class seems to be the one logging the message "Thread xx threw an uncaught exception" we often see in the logs. Nonetheless, it may be better to provide our own `UncaughtExceptionHandler` for Nemo-specific threads. (e.g., scheduler thread, runtime master thread, executor thread) I think it'd be better for us to handle this issue in separate PR, so that we can handle all threads with one PR. If this sounds good to you, I'll open up a JIRA to track this issue. Thread.setDefaultUncaughtExceptionHandler: https://github.com/apache/reef/blob/master/lang/java/reef-common/src/main/java/org/apache/reef/runtime/common/REEFLauncher.java#L178 REEFUncaughtExceptionHandler: https://github.com/apache/reef/blob/master/lang/java/reef-common/src/main/java/org/apache/reef/runtime/common/launch/REEFUncaughtExceptionHandler.java ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services