One Message was missed after recovering from abnormal shutdown. The
following is the main log of the problem:
slave shutdown:
2018-09-11 16:37:55.950 WARN ShutdownHook - shutdown ReputMessageService, but
commitlog have not finish to be dispatched, CL: 80549937152 reputFromOffset:
80549937024
2018-09-11 16:37:55.964 WARN ShutdownHook - the store may be wrong, so shutdown
abnormally, and keep abort file.
Then recover and sync message from the master, get the error log as follow:
2018-09-11 16:46:46.976 WARN ReputMessageService - [BUG]logic queue order maybe
wrong, expectLogicOffset: 1050988860 currentLogicOffset: 1050988840 Topic:
role_change QID: Diff: 20
2018-09-11 16:46:46.976 WARN ReputMessageService - [BUG]logic queue order maybe
wrong, expectLogicOffset: 1050988880 currentLogicOffset: 1050988860 Topic:
role_change QID: Diff: 20
2018-09-11 16:46:46.977 WARN ReputMessageService - [BUG]logic queue order maybe
wrong, expectLogicOffset: 1050988900 currentLogicOffset: 1050988880 Topic:
role_change QID: Diff: 20
After shutdown the master, one message cant not be consumed from the slave;
Analysis:
In the abnormal shutdown, some message was not dispatched. And after
recovering, the ReputMessageService did not reput these messages as
duplicationEnable was not enable. Is the parm "duplicationEnable" for this
problem? But i find it will not resolve the problem even if enable it, as it is
not saved.
Solution:
A: Save confirmOffset(such the file checkpoint) and ReputMessageService
reput message from the confirmOffset saved after recover.
B: In the method recover(), get the max phy offset was reputed when recover
consume queue. Then reput from the message from max phy offset was reputed to
max phy offset of commit log. It will work,whether or not duplicationEnable is
enable.
Which is more in line with the overall design? Or another better way?
thx~
[ Full content available at: https://github.com/apache/rocketmq/issues/467 ]
This message was relayed via gitbox.apache.org for [email protected]