Hi,
we have:
ceph version 10.2.2
health HEALTH_ERR
2240 pgs are stuck inactive for more than 300 seconds
273 pgs down
2240 pgs peering
2240 pgs stuck inactive
354 requests are blocked > 32 sec
mds cluster is degraded
monmap e1: 3 mons at
{cephmon1=10.0.0.11:6789/0,cephmon2=10.0.0.12:6789/0,cephmon3=10.0.0.13:6789/0}
election epoch 146, quorum 0,1,2 cephmon1,cephmon2,cephmon3
fsmap e114: 1/1/1 up {0=cephmon1=up:replay}
osdmap e2322: 24 osds: 24 up, 24 in; 2230 remapped pgs
flags sortbitwise
pgmap v8774321: 2240 pgs, 4 pools, 9997 GB data, 2629 kobjects
34753 GB used, 19173 GB / 53926 GB avail
1957 remapped+peering
273 down+remapped+peering
10 peering
health detail:
http://pastebin.com/GsQcG2U0
Sample log from one OSD:
2016-09-30 15:01:07.066632 7f2b65d70700 0 log_channel(cluster) log
[WRN] : 2 slow requests, 1 included below; oldest blocked for >
659.155019 secs
2016-09-30 15:01:07.066643 7f2b65d70700 0 log_channel(cluster) log
[WRN] : slow request 480.599877 seconds old, received at 2016-09-30
14:53:06.466705: osd_op(mds.0.114:4 5.64e96f8f (undecoded)
ack+read+known_if_redirected+full_force e2320) currently waiting for peered
2016-09-30 15:05:06.894995 7f2b35c8c700 0 -- 10.0.1.15:6810/8033 >>
10.0.1.16:6800/1679 pipe(0x7f2b9fc50800 sd=146 :6810 s=0 pgs=0 cs=0 l=0
c=0x7f2b9eaf1800).accept connect_seq 2 vs existing 1 state open
2016-09-30 15:05:06.895558 7f2b39fcf700 0 -- 10.0.1.15:6810/8033 >>
10.0.1.16:6822/13278 pipe(0x7f2b9f199400 sd=207 :59416 s=2 pgs=47 cs=1
l=0 c=0x7f2b9f247d80).fault, initiating reconnect
2016-09-30 15:05:06.895618 7f2b3a5d5700 0 -- 10.0.1.15:6810/8033 >>
10.0.1.16:6822/13278 pipe(0x7f2b9f199400 sd=207 :59416 s=1 pgs=47 cs=2
l=0 c=0x7f2b9f247d80).fault
MDS:
2016-09-30 14:53:05.112007 7f150e599180 0 ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-mds, pid 1092
2016-09-30 14:53:05.113631 7f150e599180 0 pidfile_write: ignore empty
--pid-file
2016-09-30 14:53:06.455957 7f1508574700 1 mds.cephmon1 handle_mds_map
standby
2016-09-30 14:53:06.467568 7f1508574700 1 mds.0.114 handle_mds_map i am
now mds.0.114
2016-09-30 14:53:06.467575 7f1508574700 1 mds.0.114 handle_mds_map
state change up:boot --> up:replay
2016-09-30 14:53:06.467591 7f1508574700 1 mds.0.114 replay_start
2016-09-30 14:53:06.467683 7f1508574700 1 mds.0.114 recovery set is
I already restarted ceph.
Nothing helps.
I have basically no idea what to do now.
Any help is greatly appriciated !
Thank you !
--
Mit freundlichen Gruessen / Best regards
Oliver Dzombic
IP-Interactive
mailto:[email protected]
Anschrift:
IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen
HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic
Steuer Nr.: 35 236 3622 1
UST ID: DE274086107
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com