Hi, All

Yesterday our cluster experienced a sudden loss of power. When we started
broker after power brought back, exception occurred:


The exception showed the userRecordType loaded was illegal. The operation
team deleted data journals and broker started successfully.


It was a pity we didn't backup the problematic journal files. We checked
dmesg command output, no disk errors. SMART tests on disk also showed disk
not broken. Then we digged into code(JournalImpl::readJournalFile) and
tried to find something. We have two doubts with the code.


First doubt:

The comment says "I - We scan for any valid record on the file. If a hole
happened on the middle of the file we keep looking until all the
possibilities are gone".

Considering we're appending journal file and fileId is strictly increasing,
so we can just skip the whole file if the fileId of record is not equal to
file id. IMO the rest records in the file are the same, no need to read
them. Should we keep looking all the possibilities, is there a
possibility(very low one) that we just assemble a record of which fileId,
recordType, checkSize all qualifies but actually does not exist?

Our second one:

In the case of power outage where part of record is written into disk, e.g.
recordyType,fileId is successfully written, we may read the old record
though fileId is latest?

Can anyone shed some lights on this please? Thanks.

Reply via email to