Hello Justin, On Tue, May 23, 2023 at 4:55 PM Justin Li <[email protected]> wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right direction. Thanks. > > > MDS log: > > 2023-05-24T06:21:36.831+1000 7efe56e7d700 1 mds.0.cache.den(0x600 > 1005480d3b2) loaded already corrupt dentry: [dentry #0x100/stray0/1005480d3b2 > [19ce,head] rep@0,-2.0<mailto:rep@0,-2.0> NULL (dversion lock) pv=0 > v=2154265030 ino=(nil) state=0 0x556433addb80] > > -5> 2023-05-24T06:21:36.831+1000 7efe56e7d700 -1 mds.0.damage > notify_dentry Damage to dentries in fragment * of ino 0x600is fatal because > it is a system directory for this rank > > -4> 2023-05-24T06:21:36.831+1000 7efe56e7d700 5 mds.beacon.posco > set_want_state: up:active -> down:damaged > > -3> 2023-05-24T06:21:36.831+1000 7efe56e7d700 5 mds.beacon.posco Sending > beacon down:damaged seq 5339 > > -2> 2023-05-24T06:21:36.831+1000 7efe56e7d700 10 monclient: > _send_mon_message to mon.ceph-3 at v2:10.120.0.146:3300/0 > > -1> 2023-05-24T06:21:37.659+1000 7efe60690700 5 mds.beacon.posco > received beacon reply down:damaged seq 5339 rtt 0.827966 > > 0> 2023-05-24T06:21:37.659+1000 7efe56e7d700 1 mds.posco respawn! > > > Cluster info: > root@ceph-1:~# ceph -s > cluster: > id: e2b93a76-2f97-4b34-8670-727d6ac72a64 > health: HEALTH_ERR > 1 filesystem is degraded > 1 filesystem is offline > 1 mds daemon damaged > > services: > mon: 3 daemons, quorum ceph-1,ceph-2,ceph-3 (age 26h) > mgr: ceph-3(active, since 15h), standbys: ceph-1, ceph-2 > mds: 0/1 daemons up, 3 standby > osd: 135 osds: 133 up (since 10h), 133 in (since 2w) > > data: > volumes: 0/1 healthy, 1 recovering; 1 damaged > pools: 4 pools, 4161 pgs > objects: 230.30M objects, 276 TiB > usage: 836 TiB used, 460 TiB / 1.3 PiB avail > pgs: 4138 active+clean > 13 active+clean+scrubbing > 10 active+clean+scrubbing+deep > > > > root@ceph-1:~# ceph health detail > HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon > damaged > [WRN] FS_DEGRADED: 1 filesystem is degraded > fs cephfs is degraded > [ERR] MDS_ALL_DOWN: 1 filesystem is offline > fs cephfs is offline because no MDS is active for it. > [ERR] MDS_DAMAGE: 1 mds daemon damaged > fs cephfs mds.0 is damaged
Do you have a complete log you can share? Try: https://docs.ceph.com/en/quincy/man/8/ceph-post-file/ To get your upgrade to complete, you may set: ceph config set mds mds_go_bad_corrupt_dentry false -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
