Re: Durability of appendable metadata for Log Block Manager

Mike Percy Tue, 05 Apr 2016 01:08:52 -0700

It looks to me like the WAL suffers from an issue very similar to
KUDU-1377, where we could be more forgiving when a partial record is
written. The WAL will rebuild a segment footer for you, but it won't ignore
a partial record. However it's less likely because the WAL uses
preallocation.


Mike

On Tue, Apr 5, 2016 at 12:38 AM, Adar Dembo <[email protected]> wrote:

> Quick note: the existing checksum is actually of the length AND the bytes,
> not just the bytes. But that doesn't help solve your problem, because the
> checksum follows the data, which means if the length is corrupted, you
> don't really know where the checksum lies.
>
> Isn't the solution you're describing effectively the same as what's
> described in KUDU-668? That closely mirrors what the WAL does, which I'd
> argue is a good thing (since it's tried and true).
>
> On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote:
>
> > I'm working on solving an issue in the Log Block Manager (LBM) filed as
> > KUDU-1377. In this case, writing a partial record to the end of a
> container
> > metadata file can cause a failure to restart Kudu. One possible wversay
> for
> > this to happen is to run out of disk space when appending a block id to
> > this metadata file. I wanted to discuss a potential fix for this issue.
> >
> > The LBM uses the protobuf container (PBC) file format documented in
> > pb_util.h for the container metadata file. The current version of this
> > format (V1) looks like this:
> >
> >
> > <magic number>
> > <container version>
> > repeated <record>
> >
> >
> > with each <record> looking like the following:
> >
> > <uint32 data length>
> > <data bytes>
> > <data checksum>
> >
> >
> > In the LBM, each time we create a new block we append the block id to the
> > metadata file. On startup, we verify that all records in the file are
> > valid. If not, we print to the log and exit.
> >
> > In the case of a full disk, we will have written a partial record to the
> > metadata file and at startup we will fail validation, however we should
> be
> > able to detect this case and ignore the partial record on startup.
> Because
> > we still need to support deleting blocks, we need to be able to continue
> > appending to this metadata file after startup, so we also need to
> truncate
> > the file to the last good record when this occurs.
> >
> > Here is what I am thinking about to fix this issue:
> >
> > 1. When we are reading a container metadata file at startup, if we detect
> > that there is a trailing record that is too short to fit a valid record
> > (relative to the length of the file) then we truncate the last partial
> > record from the file and continue as normal.
> >
> > 2. To avoid truncating "good" records in the case that there is data
> > corruption in one of the length fields, we also need to extend the PBC
> > format to add a checksum for the record length. So a record would now
> look
> > like the following:
> >
> > <uint32 data length>
> >
> > <length checksum>
> >
> > <data bytes>
> > <data checksum>
> >
> >
> > Does anyone see any drawbacks to this approach?
> >
> > If you made it this far, thanks for reading.
> >
> > Mike
> >
>

Re: Durability of appendable metadata for Log Block Manager

Reply via email to