Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz Wed, 20 Sep 2017 03:41:32 -0700


Hi All,

We recently experienced an "unplanned storage" fail over on our XenServerpool. The pool is 7.1 based (on certified HP kit), and runs a mix ofFreeBSD (all 10.3 based except for a legacy 9.x VM) - and a few WindowsVM's - storage is provided by two Citrix certified Synology storage boxes.

During the fail over - Xen see's the storage paths go down, and come upagain (re-attaching when they are available again). Timing this - it takesaround a minute, worst case.


The process killed 99% of our FreeBSD VM's :(

The earlier 9.x FreeBSD box survived, and all the Windows VM's survived.

Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant ofthe I/O delays that occur during a storage fail over?

I've enclosed some of the error we observed below. I realise a full storagefail over is a 'stressful time' for VM's - but the Windows VM's, andearlier FreeBSD version survived without issue. All the 10.3 boxes loggedI/O errors, and then panic'd / rebooted.

We've setup a test lab with the same kit - and can now replicate this atwill (every time most to all the FreeBSD 10.x boxes panic and reboot, butWindows prevails) - so we can test any potential fixes.

So if anyone can suggest anything we can tweak to minimize the chances ofthis happening (i.e. make I/O more timeout tolerant, or set largertimeouts?) that'd be great.


Thanks,

-Karl


Errors we observed:

ada0: disk error cmd=write 11339752-11339767 status: ffffffff

ada0: disk error cmd=writeg_vfs_done():11340544-11340607gpt/root[WRITE(offset=4731097088,length=8192)] status: ffffffff error = 5

(repeated a couple of times with different values)

Machine then goes on to panic:

g_vfs_done():panic: softdep_setup_freeblocks: inode busy
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8098e810 at kdb_backtrace+0x60
#1 0xffffffff809514e6 at vpanic+0x126
#2 0xffffffff809513b3 at panic+0x43
#3 0xffffffff80b9c685 at softdep_setup_freeblocks+0xaf5
#4 0xffffffff80b86bae at ffs_truncate+0x44e
#5 0xffffffff80bbec49 at ufs_setattr+0x769
#6 0xffffffff80e81891 at VOP_SETATTR_APV+0xa1
#7 0xffffffff80a053c5 at vn_trunacte+0x165
#8 0xffffffff809ff236 at kern_openat+0x326
#9 0xffffffff80d56e6f at amd64_syscall+0x40f
#10 0xffffffff80d3c0cb at Xfast_syscall+0xfb


Another box also logged:

ada0: disk error cmd=read 9970080-9970082 status: ffffffff
g_vfs_done():gpt/root[READ(offset=4029825024, length=1536)]error = 5
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 24219 (make)

And again, went on to panic shortly thereafter.

I had to hand transcribe the above from screen shots / video, so apologiesif any errors crept in.

I'm hoping there's just a magic sysctl / kernel option we can set to up thetimeouts? (if it is as simple as timeouts killing things)

_______________________________________________
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Reply via email to