On Wed, May 17, 2023 at 11:06:59AM +0100, Richard W.M. Jones wrote: > This filter adds random data corruption when reading from the > underlying plugin. > ---
> +=head1 VERSION > + > +C<nbdkit-pause-filter> first appeared in nbdkit 1.36. Missed touching up this part of your copy-paste. > +static enum mode evil_mode = STUCK_BITS; > +static double evil_probability = -1; /* default depends on mode */ > +static double evil_stuck_probability = 1.0; > +static uint32_t evil_seed; > + > +/* Probabilities < ε are treated as zero to avoid both divide by zero > + * problems and potentially exploding values in calculations. > + */ > +#define EPSILON 1e-12 > + > +/* Probabilities > MAXP are treated as 100%. This is because our > + * algorithm below can corrupt at most 1 bit per byte and doesn't make > + * progress otherwise. > + */ > +#define MAXP (1.0/8.0) These are reasonable constraints given your algorithm. > + > +/* This is the heart of the algorithm, the function which corrupts > + * the buffer after reading it from the plugin. > + * > + * The observation is that if we have a block of (eg) size 10**6 bits > + * and our probability of finding a corrupt bit is (eg) 1/10**4, then > + * we expect approximately 100 bits in the block to be corrupted. > + * > + * For stuck bits we want the corrupted bits to be the same on each > + * access, either relative to the backing disk (STUCK_BITS) or to the > + * request (STUCK_WIRES). > + * > + * Instead of creating an expensive bitmap ahead of time covering the > + * whole disk, we can use the random number generator with a fixed > + * seed derived from the offset of the start of the block. We can > + * then choose a random number uniformly in the range [0..2*(1/P)] (in > + * the example [0..2*10**4]) as the distance to the next corrupt bit. > + * We jump forwards, corrupt that bit, and repeat until we reach the > + * end of the block. > + * > + * "Corrupted" in this case can mean flipped by cosmic rays or stuck, > + * depending on the filter mode. > + * > + * On average this will choose the right number of bits in the block. > + * (Although their distribution will be suboptimal. In a uniform > + * distribution it should be possible for two corrupted bits to be > + * greater than 2*(1/P) apart, but the above algorithm would not do > + * this. In practice this probably doesn't matter.) Likewise, in theory with a uniform distribution, it should be possible (although generally improbable) for any two adjacent bits to be corrupted, but in practice adjacent corruption is only possible for the last bit in one byte followed by the first bit in the next byte, since MAXP is chosen so that we always resume at the next byte after performing a corruption (so only 1 out of 8 positions is even a candidate for an adjacent corruption). One of the interesting points that comes out of the study of information theory and checksumming algorithms is the design of algorithms that can detect or even correct a small run of consecutively corrupt bits, since corruptions do tend to occur in bursts. For example, CRC32 [1] can detect 100% of bursts of < 32 consecutive bit corruptions if that is the only corruption in the packet (it takes either longer bursts, or more than one disjoint corruption, before you can run into a false negative where CRC says the overall packet was good despite the corruption). As written, the evil filter has a maximum burst of 1; maybe adding a maximum-burst parameter (find the bit position(s) to inject errors at, then randomly choose the burst size of the error at that point) might be a worthwhile future addition. But I don't see it as being essential to getting a first round of the filter committed. If we do add a burst parameter, you may also want to consider whether your current choice of a 50/50 stuck high vs. stuck low may need to consider more failure modes (entire burst stuck high, entire burst stuck low, all bits in the burst inverted, or a random number used as the xor mask for which bits in the burst to flip...). Do we care about cross-version compatibility (the same bits will be corrupt for a deterministic seed, regarless of future changes to the filter), or is it merely single-run consistency (for this build of nbdkit, a deterministic seed flips this particular set of bits, but the set of bits flipped may change in the next build of nbdkit when we start consuming randomness differently in order to account for more corruption modes). (I'd probably lean to explicitly documenting that we do not guarantee cross-version determinism; leaving the door open to consume randomness differently in the future.) [1] https://en.wikipedia.org/wiki/Cyclic_redundancy_check > + > +static int > +evil_get_ready (int thread_model) > +{ > + switch (evil_mode) { > + case COSMIC_RAYS: > + xsrandom ((uint64_t) evil_seed, &state); > + break; > + > + case STUCK_BITS: > + case STUCK_WIRES: > + ; > + } > + > + /* Choose the block size based on the probability, so that at least > + * 100 bits are expected to be corrupted in the block. Block size > + * must be a power of 2. > + * > + * Example: P = 1e-4 > + * => ideal block_size = 100 / 1e-4 = 1e6 (bits) = 1e6 / 8 (bytes) > + * => next power of 2 block_size = 131072 = 2**17 > + * => expected bits per block = ~104 > + */ > + if (evil_probability < EPSILON || evil_probability > MAXP) > + block_size = 1024*1024; /* unused so value doesn't matter */ > + else > + block_size = next_power_of_2 ((uint64_t) (100. / evil_probability) / 8); > + > + nbdkit_debug ("evil: mode: %s, P: %lg, seed: %" PRIu32, > + evil_mode_to_string (evil_mode), > + evil_probability, evil_seed); > + nbdkit_debug ("evil: block_size: %" PRIu64 " (2**%d)", > + block_size, log_2_bits (block_size)); > + nbdkit_debug ("evil: expected bits per block: %g", > + 8 * block_size * evil_probability); Definitely a useful addition over v1. > + > + return 0; > +} > + > +static void corrupt_all_bits (uint8_t *buf, uint32_t count, > + struct random_state *rs, > + enum corruption_type ct); > +static uint8_t corrupt_one_bit (uint8_t byte, unsigned bit, > + uint64_t rand, enum corruption_type ct); > + > +static void > +corrupt_buffer (uint8_t *buf, uint32_t count, uint64_t offset_in_block, > + struct random_state *rs, enum corruption_type ct) > +{ > + /* No corruption, and avoids a divide by zero below. */ > + if (evil_probability < EPSILON) return; > + > + /* 100% corruption, avoids lack of progress in the loop below. */ > + if (evil_probability > MAXP) { > + corrupt_all_bits (buf, count, rs, ct); > + return; > + } > + > + uint64_t offs, intvl, i, rand; Since 'rand' is also reserved as the name of the <stdlib.h> function, you may want to pick a different name to this variable to avoid -Wshadow warnings. > + const uint64_t invp2 = (uint64_t) (2.0 / evil_probability); > + > + assert ((offset_in_block & ~(block_size-1)) == 0); > + > + /* Iterate over the whole block from the start. */ > + for (offs = 0; offs < offset_in_block + count; ) { > + /* Choose the length of the interval to the next corrupted bit, by > + * picking a random number in [0..2*(1/P)]. > + * > + * Remember this is in bits! > + */ > + intvl = xrandom (rs) % invp2; > + > + /* Consume one more random state. We may or may not use this. > + * But we need to always consume two random states per iteration > + * to make the output predictable. > + */ > + rand = xrandom (rs); > + > + /* Adjust offs to that byte. */ > + offs += intvl / 8; > + > + /* If we have gone past the end of buffer, stop. */ > + if (offs >= offset_in_block + count) break; > + > + /* If the current offs lies within the buffer, corrupt a bit. */ > + if (offs >= offset_in_block) { > + i = offs - offset_in_block; > + assert (i < count); > + buf[i] = corrupt_one_bit (buf[i], intvl & 7, rand, ct); > + } > + } > +} > + > +static void > +corrupt_all_bits (uint8_t *buf, uint32_t count, > + struct random_state *rs, enum corruption_type ct) > +{ > + size_t i; > + unsigned bit; > + uint64_t rand; > + > + /* This is used when MAXP < P <= 100%. We treat it the same as 100% > + * and corrupt all bits. > + */ > + for (i = 0; i < count; ++i) { > + for (bit = 0; bit < 8; ++bit) { > + rand = xrandom (rs); > + buf[i] = corrupt_one_bit (buf[i], bit, rand, ct); > + } > + } Is this actually corrupting all bits, or corrupting 1 bit per byte? I found the function name a bit confusing. (Doing XOR with -1 to invert all bits is not the same as choosing a random bit pattern but where only about 50% of the bits get flipped - but both behaviors can prove interesting to see how filesystems cope with such corruptions) > +} > + > +static uint8_t > +corrupt_one_bit (uint8_t byte, unsigned bit, > + uint64_t rand, enum corruption_type ct) Another use of 'rand' that might be confusing given the <stdlib.h> function. > +{ > + const unsigned mask = 1 << bit; > + > + switch (ct) { > + case FLIP: > + byte ^= mask; > + break; > + case STUCK: > + rand &= 0xffffffff; > + if (evil_stuck_probability * 0x100000000 > rand) { > + if (rand & 1) /* stuck high or low? */ > + byte |= mask; > + else > + byte &= ~mask; > + } > + } > + return byte; > +} And I ran out of review time today, before finishing reading the rest of your patch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs