Hi Jessie,

For clarification, it sounds like you are using hardware based RAID-6, and not 
ZFS raid? Is this correct? Or was the faulty card simply an HBA?

At the bottom of the ‘zpool status -v pool_name’ output, you may see paths 
and/or zfs object ID’s of the damaged/impacted files. This would be good to 
take note of.

Running a ‘zpool scrub’ is a good idea. If the zpool is protected with "ZFS 
raid", the scrub may be able to repair some of the damage. If the zpool is not 
protected with "ZFS raid", the scrub will identify any other errors, but likely 
NOT repair any of the damage.

If you have enough disk space on hardware that is behaving properly (and free 
space in the source zpool), you may want to replicate the VDEV’s (OST) that are 
reporting errors. Having a replicated VDEV can afford you the ability to 
examine the data without fear of further damage. You may also want to extract 
certain files from the replicated VDEV(s) which are producing IO errors on the 
source VDEV.

Something like this for replication should work:

zfs snap source_pool/source_ost@timestamp_label
zfs send -Rv source_pool/source_ost@timestamp_label | zfs receive 
destination_pool/source_oat_replicated

You will need to set zfs_send_corrupt_data to 1 in /sys/module/zfs/parameters 
or the ‘zfs send’ will error and fail when sending a VDEV with read and/or 
checksum errors.
Enabling zfs_send_corrupt_data allows the zfs send operation to complete. Any 
blocks that are damaged on the source side, will have “x2f5baddb10c” replaced 
in the bad blocks on the destination side. This can be helpful in 
troubleshooting if an entire file is corrupt, or parts of the file. 

After the replication, you should set the replicated VDEV to read only with 
‘zfs set readonly=on destination_pool/source_ost_replicated’

Hopefully others can chime in about the Lustre errors you have noted. 

Thanks,
Tom
 

> On Dec 12, 2016, at 3:33 PM, Jesse Stroik <[email protected]> wrote:
> 
> One of our lustre file systems still running lustre 2.5.3 and zfs 0.6.3 
> experienced corruption due to a bad RAID controller. The OST in question was 
> a RAID6 volume which we've marked inactive. Most of our lustre clients are 
> 2.8.0.
> 
> zfs status reports corruption and checksum errors. I have not run a scrub 
> since the corruption was detected but we did replace the bad RAID controller 
> and subsequent write tests to that OST have been fine. We haven't seen a 
> change in the error count with the new raid controller.
> 
> We're observing two types of errors. The first is when we attempt to perform 
> a long listing of a file to get its meta data we get "cannot allocate memory" 
> from our client. On the OSS in question, it's logged as:
> 
> ============
> LustreError: 10394:0:(ldlm_resource.c:1188:ldlm_resource_get()) 
> odyssey-OST0002: lvbo_init failed for resource 0x8ccfa8:0x0: rc = -5
> LustreError: 8855:0:(osd_object.c:409:osd_object_init()) odyssey-OST0002: 
> lookup [0x100000000:0x8ccf64:0x0]/0x78ed06 failed: rc = -5
> ============
> 
> As far as we can tell, this primarily affects recently written files and 
> we're presently using robinhood to generate a file listing from OST2 to try 
> to verify all files for this particular error.
> 
> We do have another error: attempts to read a few of our larger files on that 
> OST result in I/O errors after a partial read. I'm not sure why this would 
> have happened with the bad RAID controller as the two files we're aware of 
> weren't being written to.
> 
> I'm interested to learn a bit more about these particular Lustre errors and 
> return code and what our most likely recovery options are.
> 
> Best,
> Jesse
> 
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to