[ 
https://issues.apache.org/jira/browse/ROCKETMQ-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139963#comment-16139963
 ] 

ASF GitHub Bot commented on ROCKETMQ-272:
-----------------------------------------

GitHub user evthoriz opened a pull request:

    https://github.com/apache/incubator-rocketmq/pull/153

    [ROCKETMQ-272] Fix sync slave timeout when using SYNC_MASTER

    Jira: https://issues.apache.org/jira/browse/ROCKETMQ-272
    
    The timeout logic doesn't work correctly.
    Thread waiting in GroupTransferService may frequently waked up by 
ReadSocketService in HAConnection.
    So the transfer logic may return soon and wake up the thread waiting for 
the HA handling, which will make the timeout value in HA handling useless.
    
    This patch repairs the timeout logic in syncing, and also introduces an 
option `syncSlaveTimeout` in `MessageStoreConfig` to distinguish from the disk 
flush timeout option.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/evthoriz/incubator-rocketmq debug-ha

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-rocketmq/pull/153.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #153
    
----
commit 6f2501a24a701368b6213fd5acb3355ebdaafeb6
Author: evthoriz <[email protected]>
Date:   2017-08-24T11:50:20Z

    [ROCKETMQ-272] Fix sync slave timeout when using SYNC_MASTER

----


> The config `syncFlushTimeout` doesn't work for SYNC_MASTER
> ----------------------------------------------------------
>
>                 Key: ROCKETMQ-272
>                 URL: https://issues.apache.org/jira/browse/ROCKETMQ-272
>             Project: Apache RocketMQ
>          Issue Type: Bug
>          Components: rocketmq-broker
>    Affects Versions: 4.1.0-incubating
>            Reporter: Yu Kaiyuan
>            Assignee: yukon
>
> It's quite frequent to get result as `sendStatus=FLUSH_SLAVE_TIMEOUT` when 
> sending big messages(>500k) in SYNC_MASTER/SLAVE scenario.
> The timeout value used by the sync process currently as I found, is the 
> config `syncFlushTimeout`. And its default value is 5000 milliseconds.
> But it shows that producer get the result as `FLUSH_SLAVE_TIMEOUT` less than 
> 1 second. 
> So why does the config not work as expected?
> Relevant code:
> {code:java}
> // CommitLog.java
> public void handleHA(AppendMessageResult result, PutMessageResult 
> putMessageResult, MessageExt messageExt) {
>     if (BrokerRole.SYNC_MASTER == 
> this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
>         HAService service = this.defaultMessageStore.getHaService();
>         if (messageExt.isWaitStoreMsgOK()) {
>             // Determine whether to wait
>             if (service.isSlaveOK(result.getWroteOffset() + 
> result.getWroteBytes())) {
>                 GroupCommitRequest  request = new 
> GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
>                 service.putRequest(request);
>                 service.getWaitNotifyObject().wakeupAll();
>                 boolean flushOK =
>                     
> request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
>                 if (!flushOK) {
>                     log.error("do sync transfer other node, wait return, but 
> failed, topic: " + messageExt.getTopic() + " tags: "
>                         + messageExt.getTags() + " client address: " + 
> messageExt.getBornHostNameString());
>                     
> putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
>                 }
>             }
>             // Slave problem
>             else {
>                 // Tell the producer, slave not available
>                 
> putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
>             }
>         }
>     }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to