Hi Ian-
It looks to me like that hardware RAID array is giving ZFS data back that is 
not what ZFS thinks it wrote.  Since from ZFS’ perspective there is no 
redundancy in the pool, only what the RAID array returns, ZFS cannot 
reconstruct the file to its satisfaction, and rather than return data that ZFS 
thinks is corrupt, it is refusing to allow that file to be accessed at all.  
Lustre, which relies on the lower layers for redundancy, expects the file to be 
accessible, and it’s not.
-Laura

________________________________________
Od: lustre-discuss <[email protected]> v imenu Ian 
Yi-Feng Chang via lustre-discuss <[email protected]>
Poslano: sreda, 21. september 2022 10:53
Za: Robert Anderson; [email protected]
Zadeva: [EXTERNAL] Re: [lustre-discuss] ZFS file error of MDT

Thanks Robert for the feedback. Actually, I do not know about Lustre at all.
I am also trying to contact the engineer who built the Lustre system for more 
information regarding the drive information.
To my knowledge, the LustreMDT pool is a 4 SSD disk group (named 
/dev/mapper/SSD) with hardware RAID5.

I can manually mount the LustreMDT/mdt0-work by following steps:

pcs cluster standby --all (Stop MDS and OSS)
zpool import LustreMDT
zfs set canmount=on LustreMDT/mdt0-work
zfs mount LustreMDT/mdt0-work

Then I ls the file /LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0 it returned 
I/O error, but other files look fine.
[root@mds1 mdt0-work]# ls -ahlt "/LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0"
ls: reading directory /LustreMDT/mdt0-work/oi.3/0x200000003:0x2:0x0: 
Input/output error
total 23M
drwxr-xr-x 2 root root 2 Jan  1  1970 .
drwxr-xr-x 0 root root 0 Jan  1  1970 ..

Is this the drive failure situation you referring to?

Best,
Ian


On Wed, Sep 21, 2022 at 9:32 PM Robert Anderson 
<[email protected]<mailto:[email protected]>> wrote:
I could be reading your zpool status output wrong, but it looks like you had 2 
drives in that pool. Not mirrored, so no fault tolerance. Any drive failure 
would lose half of the pool data.

Unless you can get that drive working you are missing half of your data and 
have no resilience to errors, nothing to recover from.

However you proceed you should ensure that have a mirrored zfs pool or more 
drives and raidz (I like raidz2).


On September 20, 2022 11:57:09 PM Ian Yi-Feng Chang via lustre-discuss 
<[email protected]<mailto:[email protected]>> wrote:

CAUTION: This email originated from outside of the University System. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe.


Dear All,
I think this problem is more related to ZFS, but I would like to ask for help 
from experts in all fields.
Our MDT cannot work properly after the IB switch was accidentally rebooted 
(power issue).
Everything looks good except for the MDT cannot be started.
Our MDT's ZFS didn't have a backup or snapshot.
I would like to ask, could this problem be fixed and how to fix?

Thanks for your help in advance.

Best,
Ian

Lustre: Build Version: 2.10.4
OS: CentOS Linux release 7.5.1804 (Core)
uname -r: 3.10.0-862.el7.x86_64


[root@mds1 etc]# pcs status
Cluster name: mdsgroup01
Stack: corosync
Current DC: mds1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Sep 21 11:46:25 2022
Last change: Wed Sep 21 11:46:13 2022 by root via cibadmin on mds1

2 nodes configured
9 resources configured

Online: [ mds1 mds2 ]

Full list of resources:

 Resource Group: group-MDS
     zfs-LustreMDT      (ocf::heartbeat:ZFS):   Started mds1
     MGT        (ocf::lustre:Lustre):   Started mds1
     MDT        (ocf::lustre:Lustre):   Stopped
 ipmi-fencingMDS1       (stonith:fence_ipmilan):        Started mds2
 ipmi-fencingMDS2       (stonith:fence_ipmilan):        Started mds2
 Clone Set: healthLUSTRE-clone [healthLUSTRE]
     Started: [ mds1 mds2 ]
 Clone Set: healthLNET-clone [healthLNET]
     Started: [ mds1 mds2 ]

Failed Actions:
* MDT_start_0 on mds1 'unknown error' (1): call=44, status=complete, 
exitreason='',
    last-rc-change='Tue Sep 20 15:01:51 2022', queued=0ms, exec=317ms
* MDT_start_0 on mds2 'unknown error' (1): call=48, status=complete, 
exitreason='',
    last-rc-change='Tue Sep 20 14:38:18 2022', queued=0ms, exec=25168ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



After zpool scrub MDT, the zpool status -v of MDT pool reported:

  pool: LustreMDT
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0h35m with 1 errors on Wed Sep 21 09:38:24 2022
config:

        NAME        STATE     READ WRITE CKSUM
        LustreMDT   ONLINE       0     0     2
          SSD       ONLINE       0     0     8

errors: Permanent errors have been detected in the following files:

        LustreMDT/mdt0-work:/oi.3/0x200000003:0x2:0x0



# dmesg -T
[Tue Sep 20 15:01:43 2022] Lustre: Lustre: Build Version: 2.10.4
[Tue Sep 20 15:01:43 2022] LNet: Using FMR for registration
[Tue Sep 20 15:01:43 2022] LNet: Added LNI 172.29.32.21@o2ib [8/256/0/180]
[Tue Sep 20 15:01:50 2022] Lustre: MGS: Connection restored to 
b5823059-e620-64ac-79f6-e5282f2fa442 (at 0@lo)
[Tue Sep 20 15:01:50 2022] LustreError: 3839:0:(llog.c:1296:llog_backup()) 
MGC172.29.32.21@o2ib: failed to open log work-MDT0000: rc = -5
[Tue Sep 20 15:01:50 2022] LustreError: 
3839:0:(mgc_request.c:1897:mgc_llog_local_copy()) MGC172.29.32.21@o2ib: failed 
to copy remote log work-MDT0000: rc = -5
[Tue Sep 20 15:01:50 2022] LustreError: 13a-8: Failed to get MGS log 
work-MDT0000 and no local copy.
[Tue Sep 20 15:01:50 2022] LustreError: 15c-8: MGC172.29.32.21@o2ib: The 
configuration from log 'work-MDT0000' failed (-2). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
[Tue Sep 20 15:01:50 2022] LustreError: 
3839:0:(obd_mount_server.c:1386:server_start_targets()) failed to start server 
work-MDT0000: -2
[Tue Sep 20 15:01:50 2022] LustreError: 
3839:0:(obd_mount_server.c:1879:server_fill_super()) Unable to start targets: -2
[Tue Sep 20 15:01:50 2022] LustreError: 
3839:0:(obd_mount_server.c:1589:server_put_super()) no obd work-MDT0000
[Tue Sep 20 15:01:50 2022] Lustre: server umount work-MDT0000 complete
[Tue Sep 20 15:01:50 2022] LustreError: 
3839:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-2)
[Tue Sep 20 15:01:56 2022] Lustre: 
4112:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed 
out for slow reply: [sent 1663657311/real 1663657311]  req@ffff8d6f0e728000 
x1744471122247856/t0(0) o251->MGC172.29.32.21@o2ib@0@lo:26/25 lens 224/224 e 0 
to 1 dl 1663657317 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[Tue Sep 20 15:01:56 2022] Lustre: server umount MGS complete
[Tue Sep 20 15:02:29 2022] Lustre: MGS: Connection restored to 
b5823059-e620-64ac-79f6-e5282f2fa442 (at 0@lo)
[Tue Sep 20 15:02:54 2022] Lustre: MGS: Connection restored to 
28ec81ea-0d51-d721-7be2-4f557da2546d (at 172.29.32.1@o2ib)


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to