[ https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408772#comment-16408772 ]
ASF GitHub Bot commented on HELIX-682: -------------------------------------- Github user dasahcc commented on a diff in the pull request: https://github.com/apache/helix/pull/156#discussion_r176271788 --- Diff: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java --- @@ -121,6 +131,18 @@ public void process(ClusterEvent event) throws Exception { Message message = null; + if (shouldCleanUpPendingMessage(pendingMessage, currentState, + currentStateOutput.getEndTime(resourceName, partition, instanceName))) { + logger.info( + "Adding pending message {} on instance {} to GC. Msg: {}->{}, current state of resource {}:{} is {}", --- End diff -- Let's not use GC name for it. > Stale message should not prevent controller from rebalancing resource > --------------------------------------------------------------------- > > Key: HELIX-682 > URL: https://issues.apache.org/jira/browse/HELIX-682 > Project: Apache Helix > Issue Type: Bug > Reporter: Hao Zhang > Priority: Major > > Currently during MessageGenerationPhase, we skip re-balancing when there is > pending message. Though we assume that participant will delete messages when > they finish the task, there will be cases that when ZK is not stable and > participant fail to do so, which will leave message un-deleted and thus block > rebalance. > Ideally on controller side, we should try to delete message as well: if > partition's current state is same as message's toState, or there is totally > invalid message remaining, controller should try to delete message to unblock > rebalancing -- This message was sent by Atlassian JIRA (v7.6.3#76005)