This is an automated email from the ASF dual-hosted git repository.
jiajunwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/helix.git
The following commit(s) were added to refs/heads/master by this push:
new 5b1a63e Double check if the HelixTaskExecutor is shutting down before
make Partition in ERROR state when schedule task fails. (#1515)
5b1a63e is described below
commit 5b1a63e79a5c5159813e0ce754e541ba42adb52a
Author: Jiajun Wang <[email protected]>
AuthorDate: Fri Nov 6 16:31:57 2020 -0800
Double check if the HelixTaskExecutor is shutting down before make
Partition in ERROR state when schedule task fails. (#1515)
There is a race condition between TaskExecutor thread pool shutting down
and Message handler stops listening.
In this gap, the message will still be processed but the schedule will fail.
If we mark partition into an ERROR state, then the controller side logic
might be confused.
This PR adds an additional check to avoid the unnecessary ERROR state. But
please note this is a workaround instead complete fix.
---
.../org/apache/helix/messaging/handling/HelixTaskExecutor.java | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git
a/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
b/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
index 424d96d..b3432d4 100644
---
a/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
+++
b/helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
@@ -915,7 +915,15 @@ public class HelixTaskExecutor implements MessageListener,
TaskExecutor {
for (Map.Entry<String, MessageHandler> handlerEntry :
stateTransitionHandlers.entrySet()) {
MessageHandler handler = handlerEntry.getValue();
NotificationContext context =
stateTransitionContexts.get(handlerEntry.getKey());
- if (!scheduleTaskForMessage(instanceName, accessor, handler, context)) {
+ if (!scheduleTaskForMessage(instanceName, accessor, handler, context) &&
!_isShuttingDown) {
+ /**
+ * TODO: Checking _isShuttingDown is a workaround to avoid unnecessary
ERROR partition.
+ * TODO: We shall improve the shutdown process of the participant to
clean up the workflow
+ * TODO: completely. In detail, there isa race condition between
TaskExecutor thread
+ * TODO: pool shutting down and Message handler stops listening. In
this gap, the message
+ * TODO: will still be processed but schedule will fail. If we mark
partition into ERROR
+ * TODO: state, then the controller side logic might be confused.
+ */
try {
// Record error state to the message handler.
handler.onError(new HelixException(String