Hello Developer@

A few days ago, I posted an issue with a corrupted ZFS volume after we performed an upgrade to the freebsd-fs list:
http://lists.freebsd.org/pipermail/freebsd-fs/2014-June/019537.html

After a few days on running on v5000, one of the two servers we upgraded had panic'd, rebooted, and then panic'd on mounting one of the zpool's volumes

Steve Hartland had provided a patch to sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c, which upon a compile and install reported this upon mount:

    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648
    Solaris: WARNING: dva_get_dsize_sync(): bad DVA 131241:2147483648

Interestingly, his patch allowed me to access the data (and I've been able to 
recover some data). If I try to remove that filesystem, I get another kernel 
panic...

A few days later, the second server we had upgraded that very same morning had the exact same issue, it had unexpectedly rebooted and would panic anytime one out of 8 zfs volumes was mounted.

I think the email thread on freebsd-fs contains all of the relevant information, but to give a clear picture of the two systems:

 * Both running FreeBSD 9.1-RELEASE-p13
 * Both were upgraded to 10.0-RELEASE on the same day using freebsd-update
     o freebsd-update ids reported that all the checksums matched...
 * Both servers have a single zpool in a raidz1 configuration
 * Both servers have ECC memory, and zpool scub's are performed once a
   month (no errors reported)
 * One server boots off of zfs, the other boots off of a standard UFS2 disk
 * One server has a SSD L2ARC, the other does not
 * One server is a Dell, the other is a iXsystems/Supermicro board
     o The Dell uses the mfi driver (H710 PERC controller), the other
       uses the mps driver (LSI controller)
 * Both servers had similar sysctl settings, and had the freebsd aio
   kernel module loaded (We run Samba on these servers)


These were both multi terabyte storage nodes, one node had regular snapshots so we were able to re-create a new zfs volume based off of a snapshot. The other is just "hot" working data that does not last long, so there were no snapshots.

I'll be rebuilding these servers, and at this point, I'm just curious about the bad DVA message, and if that indicates another issue that I should be aware of.

Thank!

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to