Hello,
I've inherited a Ceph cluster from someone who has left zero documentation or
any handover. A couple days ago it decided to show the entire company what it
is capable of..
The health report looks like this:
[root@host mnt]# ceph -s
cluster:
id: 809718aa-3eac-4664-b8fa-38c46cdbfdab
health: HEALTH_ERR
1 MDSs report damaged metadata
1 MDSs are read only
2 MDSs report slow requests
6 MDSs behind on trimming
Reduced data availability: 2 pgs stale
Degraded data redundancy: 2593/186803520 objects degraded (0.001%),
2 pgs degraded, 2 pgs undersized
1 slow requests are blocked > 32 sec. Implicated osds
716 stuck requests are blocked > 4096 sec. Implicated osds 25,31,38
services:
mon: 3 daemons, quorum f,rook-ceph-mon2,rook-ceph-mon0
mgr: a(active)
mds: ceph-fs-2/2/2 up odd-fs-2/2/2 up
{[ceph-fs:0]=ceph-fs-5b997cbf7b-5tjwh=up:active,[ceph-fs:1]=ceph-fs-5b997cbf
7b-nstqz=up:active,[user-fs:0]=odd-fs-5668c75f9f-hflps=up:active,[user-fs:1]=odd-fs-5668c75f9f-jf59x=up:active},
4 up:sta
ndby-replay
osd: 39 osds: 39 up, 38 in
data:
pools: 5 pools, 706 pgs
objects: 91212k objects, 4415 GB
usage: 10415 GB used, 13024 GB / 23439 GB avail
pgs: 2593/186803520 objects degraded (0.001%)
703 active+clean
2 stale+active+undersized+degraded
1 active+clean+scrubbing+deep
io:
client: 168 kB/s rd, 6336 B/s wr, 10 op/s rd, 1 op/s wr
The offending broken MDS entry (damaged metadata) seems to be this:
mds.ceph-fs-5b997cbf7b-5tjwh: [
{
"damage_type": "dir_frag",
"id": 1190692215,
"ino": 2199023258131,
"frag": "*",
"path": "/f/01/59"
}
]
Is there any idea how I can diagnose and find out what is wrong? For the other
issues I'm not even sure what/where I need to look into.
Cheers,
Sangwhan
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com