I'm doing some powerfail recovery testing on a storage array over iSCSI. Host is RHEL 6.4 kernel 2.6.32-358.el6.x86_64.
With fio 2.1.2 -> 2.1.4 the job file below rides through the disks going away, and continues I/O after they come back, without reporting any errors. With fio 2.1.5 -> 2.1.8 when the disks come back fio immediately reports a meta verification error.
I captured a trace with an finisar analyzer, and can see that after the disks come back and the host logs back in, a read is issued for an lba which was never written to. Since I don't see verification errors outside of the powerfail testing, I suspect fio isn't correctly handling failed writes during the time the disks are unavailable.
The trace file is rather large, but I can make it available if you need to see it.
[whee] bs=8k thread=4 time_based=1 runtime=864000 readwrite=randrw direct=1 iodepth=128 ioengine=libaio size=100% verify=meta do_verify=1 verify_fatal=1 verify_dump=1 verify_backlog=8192 buffer_compress_percentage=95 ignore_error=ENODEV:EIO,ENODEV:EIO,ENODEV:EIO filename=/dev/mapper/lun0 . . filename=/dev/mapper/lun9 -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
