johnyangk commented on a change in pull request #1: [NEMO-26] Implement 
SourceLocationAwareSchedulingPolicy
URL: https://github.com/apache/incubator-nemo/pull/1#discussion_r173067005
 
 

 ##########
 File path: 
runtime/master/src/main/java/edu/snu/nemo/runtime/master/RuntimeMaster.java
 ##########
 @@ -274,16 +271,12 @@ private void handleControlMessage(final 
ControlMessage.Message message) {
           convertFailureCause(taskGroupStateChangedMsg.getFailureCause()));
       break;
     case ExecutorFailed:
+      // Executor failed due to user code.
       final ControlMessage.ExecutorFailedMsg executorFailedMsg = 
message.getExecutorFailedMsg();
       final String failedExecutorId = executorFailedMsg.getExecutorId();
       final Exception exception = 
SerializationUtils.deserialize(executorFailedMsg.getException().toByteArray());
       LOG.error(failedExecutorId + " failed, Stack Trace: ", exception);
-      containerManager.onExecutorRemoved(failedExecutorId);
       throw new RuntimeException(exception);
 
 Review comment:
   @sanha @seojangho Good catch! I've done some research on this.
   
   I think this does shut down the Driver process. It appears that we've been 
relying on `REEFUncaughtExceptionHandler` globally set for all threads to shut 
down the JVM on unchecked exceptions like this `RuntimeException`. This class 
seems to be the one logging the message "Thread xx threw an uncaught exception" 
we often see in the logs.
   
   Nonetheless, it may be better to provide our own `UncaughtExceptionHandler` 
for Nemo-specific threads. (e.g., scheduler thread, runtime master thread, 
executor thread)
   
   I think it'd be better for us to handle this issue in separate PR, so that 
we can handle all threads with one PR. If this sounds good to you, I'll open up 
a JIRA to track this issue.
   
   Thread.setDefaultUncaughtExceptionHandler:
   
https://github.com/apache/reef/blob/master/lang/java/reef-common/src/main/java/org/apache/reef/runtime/common/REEFLauncher.java#L178
   
   REEFUncaughtExceptionHandler:
   
https://github.com/apache/reef/blob/master/lang/java/reef-common/src/main/java/org/apache/reef/runtime/common/launch/REEFUncaughtExceptionHandler.java

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to