[zfs-discuss] ZFS checksum errors (ZFS-8000-8A)

Heinrich Sat, 18 Sep 2010 10:00:08 -0700

Hello,

I have a question about ZFS-8000-8A and block volumes.
I have 2 mirror sets in one zpool. 
Build 134 amd64 (upgraded since it was released from 2009.06)
Pool version is still 13


  pool: data
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 4h16m with 2 errors on Fri Sep 17 13:19:04 2010
config:

        NAME        STATE     READ WRITE CKSUM
        data    DEGRADED     0     0    28
          mirror-0  DEGRADED     0     0    56
            c0t0d0  DEGRADED     0     0    56  too many errors
            c9t0d0  DEGRADED     0     0    56  too many errors
          mirror-1  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c9t1d0  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        data/zlun02:<0x1>
        data/zlun03:<0x1>

I SAN boot my systems and this block volume is a windows install, windows boot 
and run fine. it does however indicate that the disk has a bad block in event 
viewer. I have been running this setup since build 99 and boot CentOS, win2k8 
and Vista/7 from it. 
ZFS is now unable to get data from the mirror but was able to write it before, 
so I assume this is either a controller/system fault, disk fault or a ZFS 
fault? Could a client side FC driver/HBA fault cause this? 
I did a full scrub of the pool twice. 1st only zlun02 showed up. then I 
accessed zlun03 via the windows 7 install running on zlun02 and now it shows as 
a problem also.

The system also run 2 virtual CentOS machines from this pool "data". I also 
have a samba share configured and data is accessed sometimes heavily from it 
and no problems so far for anything else.
All other pools are also fine on the system

The system has been running for 64 days (uptime) no ungraceful shutdown on 
either the server or the system accessing the block volume and then the 
problems started.
I did a full shutdown and power on, did a zpool clear and did a scrub. Still it 
remains.

My real question here is how can I make a backup/move of the block volume 
zlun02 via ZFS or is this impossible. Due to licensing on some software it is a 
real nightmare to reinstall (once I found out what the problem is)
I tried making a snapshot of the fs and tried to use zfs send/recv, but this 
fails as can be expected. 
Any ideas would be welcomed.
Also if anyone knows of a tool I can use to test the disk(s) without causing 
damage to zfs please post, offline online does not matter any tool that has 
been tested with ZFS on an affected disk. 

Needles to say these disks are under heavy load and it could be that I am real 
unlucky for it to fail at the same time. I even split each mirror set between 2 
controllers. 
I have 4 disks spare (cold), but I do not know what the result would be to 
replace each disk. I assume the rebuild/resilver will fail since not all data 
is available anymore.
Thanks,
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS checksum errors (ZFS-8000-8A)

Reply via email to