[
https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670955#comment-16670955
]
Hudson commented on HELIX-778:
------------------------------
FAILURE: Integrated in Jenkins build helix #1561 (See
[https://builds.apache.org/job/helix/1561/])
[HELIX-778] TASK: Fix a race condition in (hulee: rev
ceba1a55ae351090144c001324f908f2364212a4)
* (edit)
helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java
* (edit)
helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
> TASK: Fix a race condition in updatePreviousAssignedTasksStatus
> ---------------------------------------------------------------
>
> Key: HELIX-778
> URL: https://issues.apache.org/jira/browse/HELIX-778
> Project: Apache Helix
> Issue Type: Improvement
> Reporter: Hunter L
> Assignee: Hunter L
> Priority: Major
>
> It was observed that TestUnregisteredCommand is very unstable. The reason was
> identified to be a race condition where when a task fails, sometimes a
> pending message for that task (from INIT to RUNNING) wasn't being cleaned up
> on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus would
> try to process that message and skip the status update of that task (like
> updating its status and NUM_ATTEMPTS field in JobContext).
> A short, temporary fix is to call markPartitionError() prior to checking the
> pending message, but over the long haul, we would need to revisit the task
> status update's design here to avoid this type of race conditions.
> Changelist:
> 1. Move markPartitionError() up before checking for a pending message on the
> task
> 2. Fix TestUnregisteredCommand's instability
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)