Re: Invalid record type during startup after power outage

Clebert Suconic Tue, 16 Jul 2019 07:45:50 -0700

 A little correction, that description was for simple AddData.


Look at the extensions of JournalInternalRecord for a better
description of the data format.

On Tue, Jul 16, 2019 at 10:43 AM Clebert Suconic
<[email protected]> wrote:
>
> it seems you are using 2.4.0. It does not seem related, but this fix
> here would be important to have it on your system:
>
> commit 6b1abd1aadc2d097e3baefeb312c8e68092876ba
> Author: Clebert Suconic <[email protected]>
> Date:   Sun Aug 26 15:55:56 2018 -0400
>     ARTEMIS-2053 avoiding data loss after compacting
>
>
>
> However, let me explain you how the record scanning works:
>
> the format for the data is:
>
> Roughly
> JOURNAL-RECORD-TYPE (byte)
> FILE-ID (int)
> compact-count(byte)
> recordID(long)
> recordSize, from persisters (int)
> userRecordType
> total-record-size
>
>
>
> When we recycle a file, we simply change the fileID on the header, and
> when we load the file, the scan is done by matching the record-type
> and at the end of the record the total-record-size has to match the
> record-type.
>
> I did this to avoid filling up the file with zeros, which at the time
> was a costly operation (I wrote this when disks were still mechanical
> at the time), but that's still a costly operation.
>
> So, to wrongly trick the scan you will need a byte record, matching
> the fileID at the next int, with the recordsize and total-record-size
> matching each other.
>
>
> Perhaps the loading is skipping the verification on total-record-size
> and that snicked an invalid record?
>
>
>
> Or perhaps the fact that you missed the commit I mentioned caused you an 
> issue?
>
>
>
> On Mon, Jul 1, 2019 at 5:57 AM yw yw <[email protected]> wrote:
> >
> > Hi, All
> >
> >
> > Yesterday our cluster experienced a sudden loss of power. When we started
> > broker after power brought back, exception occurred:
> >
> >
> > The exception showed the userRecordType loaded was illegal. The operation
> > team deleted data journals and broker started successfully.
> >
> >
> > It was a pity we didn't backup the problematic journal files. We checked
> > dmesg command output, no disk errors. SMART tests on disk also showed disk
> > not broken. Then we digged into code(JournalImpl::readJournalFile) and
> > tried to find something. We have two doubts with the code.
> >
> >
> > First doubt:
> >
> > The comment says "I - We scan for any valid record on the file. If a hole
> > happened on the middle of the file we keep looking until all the
> > possibilities are gone".
> >
> > Considering we're appending journal file and fileId is strictly increasing,
> > so we can just skip the whole file if the fileId of record is not equal to
> > file id. IMO the rest records in the file are the same, no need to read
> > them. Should we keep looking all the possibilities, is there a
> > possibility(very low one) that we just assemble a record of which fileId,
> > recordType, checkSize all qualifies but actually does not exist?
> >
> > Our second one:
> >
> > In the case of power outage where part of record is written into disk, e.g.
> > recordyType,fileId is successfully written, we may read the old record
> > though fileId is latest?
> >
> > Can anyone shed some lights on this please? Thanks.
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic

Re: Invalid record type during startup after power outage

Reply via email to