[GitHub] [rocketmq] areyouok commented on pull request #3382: FIX ISSUE#2706 the problem of returning SEND_OK after flush failed,The original pull request was #2707.

GitBox Mon, 22 Nov 2021 08:26:20 -0800


areyouok commented on pull request #3382:
URL: https://github.com/apache/rocketmq/pull/3382#issuecomment-975696391



   1、具体的错误信息我能看一下吗？错误是工作中实际发生的，还是刻意模拟的呢？
   
   
2、我没有理解错的话，按照目前的修改，发生错误以后就不可恢复了，如果要保证高可用，应该让这个broker变成只读。把这个错误码返回给客户端，客户端也处理不了。如果我是消息系统的使用者，我就会质问集群的维护者，为什么有这么多发送失败？
   
   
3、你说的根据返回码决定是否重试，那是内部返回码，定义在ResponseCode中。SendStatus是对外的返回码，通常拿到这个返回码就是成功了（FLUSH_DISK_TIMEOUT、FLUSH_SLAVE_TIMEOUT、SLAVE_NOT_AVAILABLE也是成功，可以消费到的），其它的错误都是丢异常的。
   
   为什么说SendStatus的4个返回码都算是成功？我经常用RAID1举例子，假设我是存储管理员，用2块硬盘组成RAID 
1的逻辑盘提供给用户读写，正常情况下数据是双写，但如果有块硬盘坏了，我作为存储管理员应该通过报警知道这件事情，并且尽快替换掉损坏的硬盘，而整个过程中我的用户是不需要知道这件事情的，他只要享受高可用的读写服务就好。
   
   在我们厂的实践中，只要能拿到SendStatus，这4个返回码程序都是按成功处理的（报警另算）。
   
   回到当前的这个问题来，如果错误已经不可恢复了，send方法应该抛出异常，而不是在SendStatus增加返回码（ResponseCode可以加）。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [rocketmq] areyouok commented on pull request #3382: FIX ISSUE#2706 the problem of returning SEND_OK after flush failed,The original pull request was #2707.

Reply via email to