Followup-For: Bug #837964
I've tried to reproduce it using some (small) logical volumes and could
not. However, I've stared at strace output on trying to add sdc3 with
both the old (working) and new (broken) versions long enough to have
some idea what's going on—it appears mdadm is writing too much bitmap,
and thus overwriting the superblock.
I'm attaching the diff between the two runs. I'm happy to send along the
full strace runs (I actually have four: twice for each version, to see
how much two runs vary, to eliminate those difference from comparison),
but I'd prefer to send that somewhere not publically-archived in case it
contains something sensitive.
The commands run to make these files where:
mdadm -r /dev/md/pv0 /dev/sdc3 # clean up
mdadm --zero-superblock /dev/sdc3 # clean up
strace -o add-1-v3.4-2 -y -e write=6 -- mdadm -a /dev/md/pv0 /dev/sdc3 #
strace -o add-1-v3.3.2 -y -e write=6 -- /root/ugh/sbin/mdadm -a /dev/md/pv0
/dev/sdc3 # succeeds
diff -dbU10 add-1-v3.3.2 add-1-v3.4 > /tmp/mdadm-diff
- line 417: left over data from 3.4 test; don't think this matters.
Trying to add twice in a row with 3.4 both fail; trying 3.3.2 first
(after zero-superblock) works.
- line 496 appears to be superblock write. 1024 bytes starting at
999939055616. The data differs, but it does between two runs of
3.3.2 as well. Guessing it's things like UUIDs differing and not
important (but I haven't bothered to confirm).
- line 556. This looks like part of the bitmap write out. There is a
convinient (for us) lseek to check position, giving 999939054592.
3.3.2 writes 1024 bytes here, ending at 999939055615, leaving the
position at the start of the superblock. 3.4 instead writes 4k,
which of course overwites the superblock.
Not sure *why* it's writing too much bitmap, but it's overwriting the
superblock thus causing the failure.
In case it helps:
# blockdev --getsize64 /dev/sdc3
# blockdev --getsz /dev/sdb3