2017-09-19 21:04 GMT+03:00 Marat Khalili <m...@rqc.ru>: > Would be cool, but probably not wise IMHO, since on modern hardware you > almost never get one-bit errors (usually it's a whole sector of garbage), and > therefore you'd more often see an incorrect recovery than actually fixed bit. > -- > > With Best Regards, > Marat Khalili
Over the past 2 months, I've thinking about some parity solution for btrfs to have a trade-off between full duplication and single profiles. Something like the variable stripe len for mate(data), to fix one sector's errors. that is, calculate a one-time parity for each written extent. But for now, I think that this can not be fixed without tricks or changing the format. (Because if you do this in FS lvl, i.e. use a space in B-Tree, this will create one more exception, if you do this at block level, that needs a "new" raid5, it's not cool). But these are only my thoughts. Therefore, I was recall that CRC in theory allow fixup one bit of error, but for now, I do not have enough knowledge about CRC to try and implement a proof of concept for this = \. (I think that can be usefull not only in btrfs code, i.e. one bit CRC correction) But I also remember that btrfs have a lot of unused checksum space in the checksum tree: 32 bytes of the checksum field, 4 bytes for CRC32C => 28 bytes of freedom =) So for now I think about calculating the parity with the checksum data, Proof of the concept (code): https://github.com/Nefelim4ag/CRC32C_AND_8Byte_parity As I see it: 1. Btrfs calculates parity 8/16 bytes and 4 bytes of CRC32C 8/16 bytes stored at the end of the field. 2. Compatibility bits? Reason for absence: - For an old kernel that does not change anything, it's the same for old btrfs progs - That possible to silent assume that it's have a parity and try use it and fixup Because if it's missing or broken, we just fall back to old behaviour Reason for Yes: - May be we need to show by something that btrfs has parity + CRC32 for this data? 3. For x86_64, this works comparably fast with HW CRC32 --- Checking speed of hash / parity functions --- PAGE_SIZE: 4096, number of cycles: 1048576 Parity64: 0xf7182ccbfc34f088 perf: 233750 usec, th: 18374.191641 MiB / s parity32: 0xb2cdc43 perf: 464824 μs, th: 9239.986094 MiB / s crc32: 0xa4aa10b2 perf: 312446 μs, th: 13746.270703 MiB / s xxhash64: 0x77e7064e1a16f422 perf: 367570 μs, th: 11684.760171 MiB / s 4. If a CRC mismatch detected, try to correct the data by parity (for single profile only): 4.1 Make a tmp data copy 4.2. Suppose that the 0+N block / stripe damaged inverse computation of that block from parity 4.3 Check CRC for page: - mismatch? -> N + 1 -> Go to 3.1 - match? -> Hooray! -> Overwrite broken block That solution will easy fix for most sort of bit flips and up to 1-16 byte -local- corruption Possible parity combinations: 1 byte: x1 or x2 or x4 or x8 or x16 2 byte: x1 or x2 or x4 or x8 4 byte: x1 or x2 or x4 8 byte: x1 or x2 - fastest on x86_64 (i didn't have other CPUs) That you think about that? Thanks. P.S. Script for reproduction of 1 bit error case, where FS can't be mounted: #!/bin/bash DISK_IMAGE=$(mktemp) MNT_TMP_DIR="$(mktemp -d)" truncate -s 48M $DISK_IMAGE mkfs.btrfs -f -L CRC_TEST -m single $DISK_IMAGE TMP_DIR="$(mktemp -d)" mount $DISK_IMAGE $MNT_TMP_DIR echo "Test String: some_text_data" | tee $MNT_TMP_DIR/file.txt umount $MNT_TMP_DIR echo "Add 1 bit error: o -> n" sed -i 's/some_text_data/snme_text_data/g' $DISK_IMAGE btrfs check -b -p $DISK_IMAGE echo "Fix 1 bit error: n -> o" sed -i 's/snme_text_data/some_text_data/g' $DISK_IMAGE btrfs check -b -p $DISK_IMAGE mount $DISK_IMAGE $MNT_TMP_DIR cat $MNT_TMP_DIR/file.txt umount $MNT_TMP_DIR rm -fv "$DISK_IMAGE" -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html