Hello Justin,

On Tue, May 23, 2023 at 4:55 PM Justin Li <[email protected]> wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get 
> back on. Checked the MDS log and found below. See cluster info from below as 
> well. Appreciate it if anyone can point me to the right direction. Thanks.
>
>
> MDS log:
>
> 2023-05-24T06:21:36.831+1000 7efe56e7d700  1 mds.0.cache.den(0x600 
> 1005480d3b2) loaded already corrupt dentry: [dentry #0x100/stray0/1005480d3b2 
> [19ce,head] rep@0,-2.0<mailto:rep@0,-2.0> NULL (dversion lock) pv=0 
> v=2154265030 ino=(nil) state=0 0x556433addb80]
>
>     -5> 2023-05-24T06:21:36.831+1000 7efe56e7d700 -1 mds.0.damage 
> notify_dentry Damage to dentries in fragment * of ino 0x600is fatal because 
> it is a system directory for this rank
>
>     -4> 2023-05-24T06:21:36.831+1000 7efe56e7d700  5 mds.beacon.posco 
> set_want_state: up:active -> down:damaged
>
>     -3> 2023-05-24T06:21:36.831+1000 7efe56e7d700  5 mds.beacon.posco Sending 
> beacon down:damaged seq 5339
>
>     -2> 2023-05-24T06:21:36.831+1000 7efe56e7d700 10 monclient: 
> _send_mon_message to mon.ceph-3 at v2:10.120.0.146:3300/0
>
>     -1> 2023-05-24T06:21:37.659+1000 7efe60690700  5 mds.beacon.posco 
> received beacon reply down:damaged seq 5339 rtt 0.827966
>
>      0> 2023-05-24T06:21:37.659+1000 7efe56e7d700  1 mds.posco respawn!
>
>
> Cluster info:
> root@ceph-1:~# ceph -s
>   cluster:
>     id:     e2b93a76-2f97-4b34-8670-727d6ac72a64
>     health: HEALTH_ERR
>             1 filesystem is degraded
>             1 filesystem is offline
>             1 mds daemon damaged
>
>   services:
>     mon: 3 daemons, quorum ceph-1,ceph-2,ceph-3 (age 26h)
>     mgr: ceph-3(active, since 15h), standbys: ceph-1, ceph-2
>     mds: 0/1 daemons up, 3 standby
>     osd: 135 osds: 133 up (since 10h), 133 in (since 2w)
>
>   data:
>     volumes: 0/1 healthy, 1 recovering; 1 damaged
>     pools:   4 pools, 4161 pgs
>     objects: 230.30M objects, 276 TiB
>     usage:   836 TiB used, 460 TiB / 1.3 PiB avail
>     pgs:     4138 active+clean
>              13   active+clean+scrubbing
>              10   active+clean+scrubbing+deep
>
>
>
> root@ceph-1:~# ceph health detail
> HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon 
> damaged
> [WRN] FS_DEGRADED: 1 filesystem is degraded
>     fs cephfs is degraded
> [ERR] MDS_ALL_DOWN: 1 filesystem is offline
>     fs cephfs is offline because no MDS is active for it.
> [ERR] MDS_DAMAGE: 1 mds daemon damaged
>     fs cephfs mds.0 is damaged

Do you have a complete log you can share? Try:

https://docs.ceph.com/en/quincy/man/8/ceph-post-file/

To get your upgrade to complete, you may set:

ceph config set mds mds_go_bad_corrupt_dentry false

--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to