johnyangk commented on a change in pull request #1: [NEMO-26] Implement
SourceLocationAwareSchedulingPolicy
URL: https://github.com/apache/incubator-nemo/pull/1#discussion_r173067005
##########
File path:
runtime/master/src/main/java/edu/snu/nemo/runtime/master/RuntimeMaster.java
##########
@@ -274,16 +271,12 @@ private void handleControlMessage(final
ControlMessage.Message message) {
convertFailureCause(taskGroupStateChangedMsg.getFailureCause()));
break;
case ExecutorFailed:
+ // Executor failed due to user code.
final ControlMessage.ExecutorFailedMsg executorFailedMsg =
message.getExecutorFailedMsg();
final String failedExecutorId = executorFailedMsg.getExecutorId();
final Exception exception =
SerializationUtils.deserialize(executorFailedMsg.getException().toByteArray());
LOG.error(failedExecutorId + " failed, Stack Trace: ", exception);
- containerManager.onExecutorRemoved(failedExecutorId);
throw new RuntimeException(exception);
Review comment:
@sanha @seojangho Good catch! I've done some research on this.
I think this does shut down the Driver process. It appears that we've been
relying on `REEFUncaughtExceptionHandler` globally set for all threads to shut
down the JVM on unchecked exceptions like this `RuntimeException`. This class
seems to be the one logging the message "Thread xx threw an uncaught exception"
we often see in the logs.
Nonetheless, it may be better to provide our own `UncaughtExceptionHandler`
for Nemo-specific threads. (e.g., scheduler thread, runtime master thread,
executor thread)
I think it'd be better for us to handle this issue in separate PR, so that
we can handle all threads with one PR. If this sounds good to you, I'll open up
a JIRA to track this issue.
Thread.setDefaultUncaughtExceptionHandler:
https://github.com/apache/reef/blob/master/lang/java/reef-common/src/main/java/org/apache/reef/runtime/common/REEFLauncher.java#L178
REEFUncaughtExceptionHandler:
https://github.com/apache/reef/blob/master/lang/java/reef-common/src/main/java/org/apache/reef/runtime/common/launch/REEFUncaughtExceptionHandler.java
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services