It looks to me like the WAL suffers from an issue very similar to KUDU-1377, where we could be more forgiving when a partial record is written. The WAL will rebuild a segment footer for you, but it won't ignore a partial record. However it's less likely because the WAL uses preallocation.
Mike On Tue, Apr 5, 2016 at 12:38 AM, Adar Dembo <[email protected]> wrote: > Quick note: the existing checksum is actually of the length AND the bytes, > not just the bytes. But that doesn't help solve your problem, because the > checksum follows the data, which means if the length is corrupted, you > don't really know where the checksum lies. > > Isn't the solution you're describing effectively the same as what's > described in KUDU-668? That closely mirrors what the WAL does, which I'd > argue is a good thing (since it's tried and true). > > On Tue, Apr 5, 2016 at 12:14 AM, Mike Percy <[email protected]> wrote: > > > I'm working on solving an issue in the Log Block Manager (LBM) filed as > > KUDU-1377. In this case, writing a partial record to the end of a > container > > metadata file can cause a failure to restart Kudu. One possible wversay > for > > this to happen is to run out of disk space when appending a block id to > > this metadata file. I wanted to discuss a potential fix for this issue. > > > > The LBM uses the protobuf container (PBC) file format documented in > > pb_util.h for the container metadata file. The current version of this > > format (V1) looks like this: > > > > > > <magic number> > > <container version> > > repeated <record> > > > > > > with each <record> looking like the following: > > > > <uint32 data length> > > <data bytes> > > <data checksum> > > > > > > In the LBM, each time we create a new block we append the block id to the > > metadata file. On startup, we verify that all records in the file are > > valid. If not, we print to the log and exit. > > > > In the case of a full disk, we will have written a partial record to the > > metadata file and at startup we will fail validation, however we should > be > > able to detect this case and ignore the partial record on startup. > Because > > we still need to support deleting blocks, we need to be able to continue > > appending to this metadata file after startup, so we also need to > truncate > > the file to the last good record when this occurs. > > > > Here is what I am thinking about to fix this issue: > > > > 1. When we are reading a container metadata file at startup, if we detect > > that there is a trailing record that is too short to fit a valid record > > (relative to the length of the file) then we truncate the last partial > > record from the file and continue as normal. > > > > 2. To avoid truncating "good" records in the case that there is data > > corruption in one of the length fields, we also need to extend the PBC > > format to add a checksum for the record length. So a record would now > look > > like the following: > > > > <uint32 data length> > > > > <length checksum> > > > > <data bytes> > > <data checksum> > > > > > > Does anyone see any drawbacks to this approach? > > > > If you made it this far, thanks for reading. > > > > Mike > > >
