Hi Laura, Thank you for the feedback. I'm wondering if I could remove the corrupted file from MDT and clear the file error. Without the file error, the Lustre storage might be started again. I understand some files would definitely miss, but at least we have an opportunity to recover other files back.
Best, Ian On Sat, Sep 24, 2022 at 5:35 AM Laura Hild <[email protected]> wrote: > Hi Ian- > > It looks to me like that hardware RAID array is giving ZFS data back that > is not what ZFS thinks it wrote. Since from ZFS’ perspective there is no > redundancy in the pool, only what the RAID array returns, ZFS cannot > reconstruct the file to its satisfaction, and rather than return data that > ZFS thinks is corrupt, it is refusing to allow that file to be accessed at > all. Lustre, which relies on the lower layers for redundancy, expects the > file to be accessible, and it’s not. > > -Laura > > > ________________________________________ > Od: lustre-discuss <[email protected]> v imenu Ian > Yi-Feng Chang via lustre-discuss <[email protected]> > Poslano: sreda, 21. september 2022 10:53 > Za: Robert Anderson; [email protected] > Zadeva: [EXTERNAL] Re: [lustre-discuss] ZFS file error of MDT > > Thanks Robert for the feedback. Actually, I do not know about Lustre at > all. > I am also trying to contact the engineer who built the Lustre system for > more information regarding the drive information. > To my knowledge, the LustreMDT pool is a 4 SSD disk group (named > /dev/mapper/SSD) with hardware RAID5. > > I can manually mount the LustreMDT/mdt0-work by following steps: > > pcs cluster standby --all (Stop MDS and OSS) > zpool import LustreMDT > zfs set canmount=on LustreMDT/mdt0-work > zfs mount LustreMDT/mdt0-work > > Then I ls the file /LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0 it > returned I/O error, but other files look fine. > [root@mds1 mdt0-work]# ls -ahlt > "/LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0" > ls: reading directory /LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0: > Input/output error > total 23M > drwxr-xr-x 2 root root 2 Jan 1 1970 . > drwxr-xr-x 0 root root 0 Jan 1 1970 .. > > Is this the drive failure situation you referring to? > > Best, > Ian > > > On Wed, Sep 21, 2022 at 9:32 PM Robert Anderson <[email protected]<mailto: > [email protected]>> wrote: > I could be reading your zpool status output wrong, but it looks like you > had 2 drives in that pool. Not mirrored, so no fault tolerance. Any drive > failure would lose half of the pool data. > > Unless you can get that drive working you are missing half of your data > and have no resilience to errors, nothing to recover from. > > However you proceed you should ensure that have a mirrored zfs pool or > more drives and raidz (I like raidz2). > > > On September 20, 2022 11:57:09 PM Ian Yi-Feng Chang via lustre-discuss < > [email protected]<mailto:[email protected]>> > wrote: > > CAUTION: This email originated from outside of the University System. Do > not click links or open attachments unless you recognize the sender and > know the content is safe. > > > Dear All, > I think this problem is more related to ZFS, but I would like to ask for > help from experts in all fields. > Our MDT cannot work properly after the IB switch was accidentally rebooted > (power issue). > Everything looks good except for the MDT cannot be started. > Our MDT's ZFS didn't have a backup or snapshot. > I would like to ask, could this problem be fixed and how to fix? > > Thanks for your help in advance. > > Best, > Ian > > Lustre: Build Version: 2.10.4 > OS: CentOS Linux release 7.5.1804 (Core) > uname -r: 3.10.0-862.el7.x86_64 > > > [root@mds1 etc]# pcs status > Cluster name: mdsgroup01 > Stack: corosync > Current DC: mds1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with > quorum > Last updated: Wed Sep 21 11:46:25 2022 > Last change: Wed Sep 21 11:46:13 2022 by root via cibadmin on mds1 > > 2 nodes configured > 9 resources configured > > Online: [ mds1 mds2 ] > > Full list of resources: > > Resource Group: group-MDS > zfs-LustreMDT (ocf::heartbeat:ZFS): Started mds1 > MGT (ocf::lustre:Lustre): Started mds1 > MDT (ocf::lustre:Lustre): Stopped > ipmi-fencingMDS1 (stonith:fence_ipmilan): Started mds2 > ipmi-fencingMDS2 (stonith:fence_ipmilan): Started mds2 > Clone Set: healthLUSTRE-clone [healthLUSTRE] > Started: [ mds1 mds2 ] > Clone Set: healthLNET-clone [healthLNET] > Started: [ mds1 mds2 ] > > Failed Actions: > * MDT_start_0 on mds1 'unknown error' (1): call=44, status=complete, > exitreason='', > last-rc-change='Tue Sep 20 15:01:51 2022', queued=0ms, exec=317ms > * MDT_start_0 on mds2 'unknown error' (1): call=48, status=complete, > exitreason='', > last-rc-change='Tue Sep 20 14:38:18 2022', queued=0ms, exec=25168ms > > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > After zpool scrub MDT, the zpool status -v of MDT pool reported: > > pool: LustreMDT > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://zfsonlinux.org/msg/ZFS-8000-8A > scan: scrub repaired 0B in 0h35m with 1 errors on Wed Sep 21 09:38:24 > 2022 > config: > > NAME STATE READ WRITE CKSUM > LustreMDT ONLINE 0 0 2 > SSD ONLINE 0 0 8 > > errors: Permanent errors have been detected in the following files: > > LustreMDT/mdt0-work:/oi.3/0x200000003:0x2:0x0 > > > > # dmesg -T > [Tue Sep 20 15:01:43 2022] Lustre: Lustre: Build Version: 2.10.4 > [Tue Sep 20 15:01:43 2022] LNet: Using FMR for registration > [Tue Sep 20 15:01:43 2022] LNet: Added LNI 172.29.32.21@o2ib [8/256/0/180] > [Tue Sep 20 15:01:50 2022] Lustre: MGS: Connection restored to > b5823059-e620-64ac-79f6-e5282f2fa442 (at 0@lo) > [Tue Sep 20 15:01:50 2022] LustreError: 3839:0:(llog.c:1296:llog_backup()) > MGC172.29.32.21@o2ib: failed to open log work-MDT0000: rc = -5 > [Tue Sep 20 15:01:50 2022] LustreError: > 3839:0:(mgc_request.c:1897:mgc_llog_local_copy()) MGC172.29.32.21@o2ib: > failed to copy remote log work-MDT0000: rc = -5 > [Tue Sep 20 15:01:50 2022] LustreError: 13a-8: Failed to get MGS log > work-MDT0000 and no local copy. > [Tue Sep 20 15:01:50 2022] LustreError: 15c-8: MGC172.29.32.21@o2ib: The > configuration from log 'work-MDT0000' failed (-2). This may be the result > of communication errors between this node and the MGS, a bad configuration, > or other errors. See the syslog for more information. > [Tue Sep 20 15:01:50 2022] LustreError: > 3839:0:(obd_mount_server.c:1386:server_start_targets()) failed to start > server work-MDT0000: -2 > [Tue Sep 20 15:01:50 2022] LustreError: > 3839:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start > targets: -2 > [Tue Sep 20 15:01:50 2022] LustreError: > 3839:0:(obd_mount_server.c:1589:server_put_super()) no obd work-MDT0000 > [Tue Sep 20 15:01:50 2022] Lustre: server umount work-MDT0000 complete > [Tue Sep 20 15:01:50 2022] LustreError: > 3839:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount (-2) > [Tue Sep 20 15:01:56 2022] Lustre: > 4112:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has > timed out for slow reply: [sent 1663657311/real 1663657311] > req@ffff8d6f0e728000 x1744471122247856/t0(0) o251->MGC172.29.32.21@o2ib > @0@lo:26/25 lens 224/224 e 0 to 1 dl 1663657317 ref 2 fl > Rpc:XN/0/ffffffff rc 0/-1 > [Tue Sep 20 15:01:56 2022] Lustre: server umount MGS complete > [Tue Sep 20 15:02:29 2022] Lustre: MGS: Connection restored to > b5823059-e620-64ac-79f6-e5282f2fa442 (at 0@lo) > [Tue Sep 20 15:02:54 2022] Lustre: MGS: Connection restored to > 28ec81ea-0d51-d721-7be2-4f557da2546d (at 172.29.32.1@o2ib) > > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
