This is an automated email from the ASF dual-hosted git repository.

jiajunwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/helix.git


The following commit(s) were added to refs/heads/master by this push:
     new 5b1a63e  Double check if the HelixTaskExecutor is shutting down before 
make Partition in ERROR state when schedule task fails. (#1515)
5b1a63e is described below

commit 5b1a63e79a5c5159813e0ce754e541ba42adb52a
Author: Jiajun Wang <[email protected]>
AuthorDate: Fri Nov 6 16:31:57 2020 -0800

    Double check if the HelixTaskExecutor is shutting down before make 
Partition in ERROR state when schedule task fails. (#1515)
    
    There is a race condition between TaskExecutor thread pool shutting down 
and Message handler stops listening.
    In this gap, the message will still be processed but the schedule will fail.
    If we mark partition into an ERROR state, then the controller side logic 
might be confused.
    This PR adds an additional check to avoid the unnecessary ERROR state. But 
please note this is a workaround instead complete fix.
---
 .../org/apache/helix/messaging/handling/HelixTaskExecutor.java | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
 
b/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
index 424d96d..b3432d4 100644
--- 
a/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
+++ 
b/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
@@ -915,7 +915,15 @@ public class HelixTaskExecutor implements MessageListener, 
TaskExecutor {
     for (Map.Entry<String, MessageHandler> handlerEntry : 
stateTransitionHandlers.entrySet()) {
       MessageHandler handler = handlerEntry.getValue();
       NotificationContext context = 
stateTransitionContexts.get(handlerEntry.getKey());
-      if (!scheduleTaskForMessage(instanceName, accessor, handler, context)) {
+      if (!scheduleTaskForMessage(instanceName, accessor, handler, context) && 
!_isShuttingDown) {
+        /**
+         * TODO: Checking _isShuttingDown is a workaround to avoid unnecessary 
ERROR partition.
+         * TODO: We shall improve the shutdown process of the participant to 
clean up the workflow
+         * TODO: completely. In detail, there isa race condition between 
TaskExecutor thread
+         * TODO: pool shutting down and Message handler stops listening. In 
this gap, the message
+         * TODO: will still be processed but schedule will fail. If we mark 
partition into ERROR
+         * TODO: state, then the controller side logic might be confused.
+         */
         try {
           // Record error state to the message handler.
           handler.onError(new HelixException(String

Reply via email to