My RAID5 QA: crash after rebuild+reboot+mount (and more scary results)

Alexander A. Klimov Thu, 27 Oct 2022 10:35:22 -0700

Hi everyone!

While setting up a new box w/ RAID I decided to stress-test the latterBEFORE losing anything important.


Meanwhile I've setup a test VM and did the following.

Install OpenBSD 7.1, nothing special.

I.e. de keyboard, hostname foo, default network cfg, no SSH, no X, nocom0, no users, default timezone, default disk setup, default file sets.


Wipe[1] the additional three disks (the system is not installed on).

Run syspatch.
Run syspatch.
Run syspatch.

Run sysupgrade. -> 7.2

Run syspatch.
Run syspatch.

Reboot[1].

Run disklabel -E for each RAID disk:
D ENTER
a ENTER
ENTER
ENTER
ENTER
RAID ENTER
q ENTER
ENTER

Assemble RAID5: bioctl -c 5 -l sd1a,sd2a,sd3a softraid0

Run disklabel -E sd4:
D ENTER
a ENTER
ENTER
ENTER
ENTER
ENTER
q ENTER
ENTER

Make FS: newfs sd4a

Mount it: mount /dev/sd4a /mnt

Put some files on it.

---

[1] This way the OS definitely sees no MBRs and any new disklabels willbe plain BSD ones. By the way I previously tried GPT (fdisk -gy) and BSDdisklabel on top (disklabel -E) (as I'm on a x86 box), both on the threedisks and on the virtual RAID5 "disk" before creating FS. Writes with F3or the equivalent,

while dd bs=32M count=32 if=/dev/urandom of=/mnt/`date +%s`; do true; done

, I used to "Put some files on it" freezed the whole pre-7.2 snapshot OS(even no ddb> ). Sure, why to use (and even support) nested part. tableson non-boot disks (and I've "solved" the problem by using plain BSDdisklabels). But a complete OS freeze, despite su nobody and renice -n19? Really?Btw. thanks to the ones of my BCCed colleagues who let me test this onseveral boxes at work. I couldn't reproduce the freeze, but hey I triedit. (Have I already cursed the one box which hw RAID ctrler doesn'tsupport OpenBSD?)

---

Reboot.

Mount.
Data is still there.
Good.

Boot a Linux, wipe 1/3 disks.

Boot OpenBSD again.

bioctl sd4 shows degraded; 1/3 disks missing.
Okay...

New disklabel on wiped disk, bioctl -R /dev/sd3a sd4 and get a coffee...

---

[1] In some case, I think 7.1-stable and/or pre-7.2 snapshot with nestedpart. tables as mentioned above, sd4 wasn't even present, so couldn't dobioctl -R. I had to use bioctl -c ... .

---

Rebuild done? Reboot!

Mount the FS again.

Type cat /mnt/<TAB>... er, what?

At this point I actually planned to write about the data still beingthere IF(!) you didn't mount anything before[2] rebuild+reboot.

And then I saw this:
https://files.al2klimov.de/s/z8JW8py2KM5nbEB

(Come. On. This isn't even funny. :'( Funny is only that the box I doall this for shall be the backup server for the box files.al2klimov.de.So if the latter is gone, the pictures I linked above are also gonebecause there's no backup on the server I can't set up until I'm ensurethat RAID is OK which the crash pictures are for.)

Are the screenshots enough? Do you need anything more? The VM is stillon ddb> .

---

[2] Otherwise you should get a 1GB file -not a dir!- on /mnt. But asyou've already read I was interrupted by a crash.

---

Have I already mentioned a pulled out SATA cable (online!) get handledwell? (Degraded, data still there, at least until reboot.)


Best,
A/K

My RAID5 QA: crash after rebuild+reboot+mount (and more scary results)

Reply via email to