Isn't it more likely that these are errors on data as well? I think zfs retries read operations when there's a checksum failure, so maybe these are transient hardware problems (faulty cables, high temperature..)?
This would explain the non-existence of unrecoverable errors. Robert Milkowski wrote: > Hello Robert, > > Thursday, March 29, 2007, 12:37:28 AM, you wrote: > > RM> Hello Robert, > > RM> Wednesday, March 21, 2007, 10:36:15 AM, you wrote: > > RM>> Hello Robert, > > RM>> Saturday, March 17, 2007, 6:49:05 PM, you wrote: > > RM>>> Hello Thomas, > > RM>>> Saturday, March 17, 2007, 11:46:14 AM, you wrote: > > TN>>>> On Fri, 16 Mar 2007, Anton B. Rang wrote: >>>>>> It's possible (if unlikely) that you are only getting checksum errors on >>>>>> metadata. Since ZFS always internally mirrors its metadata, even on >>>>>> non-redundant pools, it can recover from metadata corruption which does >>>>>> not affect all copies. (If there is only one LUN, the mirroring happens >>>>>> at different locations on the same LUN.) > > TN>>>> I thought about that but looking at the NFS server the real data > should be > TN>>>> much much more than metadata so I would consider it unlikely. Also in > the > TN>>>> now redundant setup we see checksum errors on both attached RAIDs > > TN>>>> Any hints on how to track down the problem to the HBA, cables, RAID > and so > TN>>>> on? We see similar things on all our machines with few exceptions. > Talking > TN>>>> to local Sun folks we have been "warned" before that checksum errors > will > TN>>>> show up and that it's considered normal. Nevertheless I really want to > TN>>>> know what they are about > > RM>>> I have an opened CR for months now about the same problem - lot of > RM>>> CKSUM errors all seem to be only meta-data related which is highly > RM>>> unlikely. > > RM>> We've reinstalled servers to U3 and SC3.2 and for last few days no > RM>> single CKSUM error (the same pools were imported) - so maybe something > RM>> wrong was with U2. > > RM> One of those server has reported again some CKSUM errors in the same way > so it > RM> looks like only metadata were involved. So the problem is still there > RM> but on U3 to much less extent. > > bash-3.00# uname -a > SunOS XXXXX 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Fire-V240 > bash-3.00# > > > [...] > > pool: nfs-s5-s6 > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > nfs-s5-s6 ONLINE 0 0 7 > c4t600C0FF00000000009258F4855B59001d0 ONLINE 0 0 7 > > errors: No known data errors > > pool: nfs-s5-s7 > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > nfs-s5-s7 ONLINE 0 0 6 > c4t600C0FF00000000009258F28706F5201d0 ONLINE 0 0 6 > > errors: No known data errors > > pool: nfs-s5-s8 > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-9P > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > nfs-s5-s8 ONLINE 0 0 10 > c4t600C0FF00000000009258F3E4C4C5601d0 ONLINE 0 0 10 > > errors: No known data errors > bash-3.00# > > > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss