[ https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409910#comment-16409910 ]
ASF GitHub Bot commented on HELIX-682: -------------------------------------- Github user zhan849 commented on a diff in the pull request: https://github.com/apache/helix/pull/156#discussion_r176498024 --- Diff: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java --- @@ -121,6 +131,18 @@ public void process(ClusterEvent event) throws Exception { Message message = null; + if (shouldCleanUpPendingMessage(pendingMessage, currentState, + currentStateOutput.getEndTime(resourceName, partition, instanceName))) { + logger.info( + "Adding pending message {} on instance {} to GC. Msg: {}->{}, current state of resource {}:{} is {}", --- End diff -- changed it to "cleanup" > Stale message should not prevent controller from rebalancing resource > --------------------------------------------------------------------- > > Key: HELIX-682 > URL: https://issues.apache.org/jira/browse/HELIX-682 > Project: Apache Helix > Issue Type: Bug > Reporter: Hao Zhang > Priority: Major > > Currently during MessageGenerationPhase, we skip re-balancing when there is > pending message. Though we assume that participant will delete messages when > they finish the task, there will be cases that when ZK is not stable and > participant fail to do so, which will leave message un-deleted and thus block > rebalance. > Ideally on controller side, we should try to delete message as well: if > partition's current state is same as message's toState, or there is totally > invalid message remaining, controller should try to delete message to unblock > rebalancing -- This message was sent by Atlassian JIRA (v7.6.3#76005)