Re: [lustre-discuss] LustreError on ZFS volumes

Crowe, Tom Mon, 12 Dec 2016 13:52:03 -0800

Hi Jessie,

In regards to you seeing 370 objects with errors form ‘zpool status’, but 
having over 400 files with “access issues”, I would suggest running the ‘zpool 
scrub’ to identify all the ZFS objects in the pool that are reporting permanent 
errors.


It would be very important to have a complete list of files w/issues, before 
replicating the VDEV(s) in question. 

You may also want to dump the zdb information for the source VDEV(s) with the 
following: 

zdb -dddddd source_pool/source_vdev > /some/where/with/room

For example, if the zpool was named pool-01, and the VDEV was named lustre-0001 
and you had free space in a filesystem named /home:

zdb -dddddd pool-01/lustre-0001 > /home/zdb_pool-01_0001_20161212.out

There is a great wealth of data zdb can share about your files. Having the 
output may prove helpful down the road.

Thanks,
Tom

> On Dec 12, 2016, at 4:39 PM, Jesse Stroik <[email protected]> wrote:
> 
> Thanks for taking the time to respond, Tom,
> 
> 
>> For clarification, it sounds like you are using hardware based RAID-6, and 
>> not ZFS raid? Is this correct? Or was the faulty card simply an HBA?
> 
> 
> You are correct. This particular file system is still using hardware RAID6.
> 
> 
>> At the bottom of the ‘zpool status -v pool_name’ output, you may see paths 
>> and/or zfs object ID’s of the damaged/impacted files. This would be good to 
>> take note of.
> 
> 
> Yes, I output this to files at a few different times and we've had no chance 
> since replacing the RAID controller, which makes me feel reasonably 
> comfortable leaving the file system in production.
> 
> There are 370 objects listed by zpool status -v but I am unable to access at 
> least 400 files. Almost all of our files are single stripe.
> 
> 
>> Running a ‘zpool scrub’ is a good idea. If the zpool is protected with "ZFS 
>> raid", the scrub may be able to repair some of the damage. If the zpool is 
>> not protected with "ZFS raid", the scrub will identify any other errors, but 
>> likely NOT repair any of the damage.
> 
> 
> We're not protected with ZFS RAID, just hardware raid6. I could run a patrol 
> on the hardware controller and then a ZFS scrub if that makes the most sense 
> at this point. This file system is scheduled to run a scrub the third week of 
> every month so it would run one this weekend otherwise.
> 
> 
> 
>> If you have enough disk space on hardware that is behaving properly (and 
>> free space in the source zpool), you may want to replicate the VDEV’s (OST) 
>> that are reporting errors. Having a replicated VDEV can afford you the 
>> ability to examine the data without fear of further damage. You may also 
>> want to extract certain files from the replicated VDEV(s) which are 
>> producing IO errors on the source VDEV.
>> 
>> Something like this for replication should work:
>> 
>> zfs snap source_pool/source_ost@timestamp_label
>> zfs send -Rv source_pool/source_ost@timestamp_label | zfs receive 
>> destination_pool/source_oat_replicated
>> 
>> You will need to set zfs_send_corrupt_data to 1 in 
>> /sys/module/zfs/parameters or the ‘zfs send’ will error and fail when 
>> sending a VDEV with read and/or checksum errors.
>> Enabling zfs_send_corrupt_data allows the zfs send operation to complete. 
>> Any blocks that are damaged on the source side, will have “x2f5baddb10c” 
>> replaced in the bad blocks on the destination side. This can be helpful in 
>> troubleshooting if an entire file is corrupt, or parts of the file.
>> 
>> After the replication, you should set the replicated VDEV to read only with 
>> ‘zfs set readonly=on destination_pool/source_ost_replicated’
>> 
> 
> Thank you for this suggestion. We'll most likely do that.
> 
> Best,
> Jesse Stroik
> 

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] LustreError on ZFS volumes

Reply via email to