[ 
https://issues.apache.org/jira/browse/HELIX-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408337#comment-16408337
 ] 

ASF GitHub Bot commented on HELIX-681:
--------------------------------------

Github user zhan849 commented on a diff in the pull request:

    https://github.com/apache/helix/pull/152#discussion_r176182830
  
    --- Diff: 
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTask.java ---
    @@ -168,7 +169,14 @@ public HelixTaskResult call() {
     
           // forward relay messages attached to this message to other 
participants
           if (taskResult.isSuccess()) {
    -        forwardRelayMessages(accessor, _message, 
taskResult.getCompleteTime());
    +        try {
    +          forwardRelayMessages(accessor, _message, 
taskResult.getCompleteTime());
    +        } catch (Exception e) {
    +          // Fail to send relay message should not result in a task 
execution failure
    +          // Currently we don't log error to ZK to reduce writes as when 
accessor throws
    +          // exception, ZK might not be in good condition.
    +          logger.error("Failed to send relay messages.", e);
    --- End diff --
    
    will change


> Participant should not fail state transition on fail to delete / relay message
> ------------------------------------------------------------------------------
>
>                 Key: HELIX-681
>                 URL: https://issues.apache.org/jira/browse/HELIX-681
>             Project: Apache Helix
>          Issue Type: Bug
>            Reporter: Hao Zhang
>            Priority: Major
>
> Currently we have a general try-catch block in HelixTask and 
> HelixTaskExecutor, which, upon any exception thrown from state transition 
> routine, will fail state transition. However there are at least the following 
> cases in which state transition should be considered as successful:
>  * When we fail to delete message after successfully handled message and 
> updated current state -> this is because we already completed state 
> transition and current state is consistent between participant and ZK
>  * When we fail to send out relay message > as relay message provides only 
> best effort of delivering messages, which has nothing to do with state 
> transition's results. In case of fail to relay message, controller will 
> resend message which ensures correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to