Sorry, I have not understood the problem well, the problem I see is that
once the OSD fails, the cluster recovers but the MDS remains faulty:
# ceph status
cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health: HEALTH_WARN
3 clients failing to respond to capability release
2 MDSs report slow metadata IOs
2 MDSs report slow requests
2 MDSs behind on trimming
Reduced data availability: 256 pgs inactive, 18 pgs down,
238 pgs incomplete
22 slow ops, oldest one blocked for 26719 sec, daemons
[osd.134,osd.210,osd.244,osd.251,osd.301,osd.514,osd.520,osd.528,osd.642,osd.713]...
have slow ops.
services:
mon: 3 daemons, quorum ceph2mon01,ceph2mon02,ceph2mon03 (age 23h)
mgr: ceph2mon02(active, since 6d), standbys: ceph2mon01, ceph2mon03
mds: nxtclfs:2 {0=ceph2mon01=up:active,1=ceph2mon02=up:active} 1
up:standby
osd: 768 osds: 736 up (since 7h), 736 in (since 7h)
data:
pools: 2 pools, 16384 pgs
objects: 33.39M objects, 39 TiB
usage: 64 TiB used, 2.6 PiB / 2.6 PiB avail
pgs: 1.562% pgs not active
16128 active+clean
238 incomplete
18 down
El 5/5/21 a las 11:00, Andres Rojas Guerrero escribió:
> Yes, the principal problem is the MDS start to report slowly and the
> information is no longer accessible, and the cluster never recover.
>
>
> # ceph status
> cluster:
> id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
> health: HEALTH_WARN
> 2 clients failing to respond to capability release
> 2 MDSs report slow metadata IOs
> 1 MDSs report slow requests
> 2 MDSs behind on trimming
> Reduced data availability: 238 pgs inactive, 8 pgs down, 230
> pgs incomplete
> Degraded data redundancy: 1400453/220552172 objects degraded
> (0.635%), 461 pgs degraded, 464 pgs undersized
> 241 slow ops, oldest one blocked for 638 sec, daemons
> [osd.101,osd.127,osd.155,osd.166,osd.172,osd.189,osd.200,osd.210,osd.214,osd.233]...
> have slow ops.
>
> services:
> mon: 3 daemons, quorum ceph2mon01,ceph2mon02,ceph2mon03 (age 25h)
> mgr: ceph2mon02(active, since 6d), standbys: ceph2mon01, ceph2mon03
> mds: nxtclfs:2 {0=ceph2mon01=up:active,1=ceph2mon02=up:active} 1
> up:standby
> osd: 768 osds: 736 up (since 11m), 736 in (since 95s); 416 remapped pgs
>
> data:
> pools: 2 pools, 16384 pgs
> objects: 33.40M objects, 39 TiB
> usage: 63 TiB used, 2.6 PiB / 2.6 PiB avail
> pgs: 1.489% pgs not active
> 1400453/220552172 objects degraded (0.635%)
> 15676 active+clean
> 285 active+undersized+degraded+remapped+backfill_wait
> 230 incomplete
> 176 active+undersized+degraded+remapped+backfilling
> 8 down
> 6 peering
> 3 active+undersized+remapped
>
> El 5/5/21 a las 10:54, David Caro escribió:
>>
>> Can you share more information?
>>
>> The output of 'ceph status' when the osd is down would help, also 'ceph
>> health detail' could be useful.
>>
>> On 05/05 10:48, Andres Rojas Guerrero wrote:
>>> Hi, I have a Nautilus cluster version 14.2.6 , and I have noted that
>>> when some OSD go down the cluster doesn't start recover. I have checked
>>> that the option noout is unset.
>>>
>>> What could be the reason for this behavior?
>>>
>>>
>>>
>>> --
>>> *******************************************************
>>> Andrés Rojas Guerrero
>>> Unidad Sistemas Linux
>>> Area Arquitectura Tecnológica
>>> Secretaría General Adjunta de Informática
>>> Consejo Superior de Investigaciones Científicas (CSIC)
>>> Pinar 19
>>> 28006 - Madrid
>>> Tel: +34 915680059 -- Ext. 990059
>>> email: [email protected]
>>> ID comunicate.csic.es: @50852720l:matrix.csic.es
>>> *******************************************************
>>> _______________________________________________
>>> ceph-users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>
>
--
*******************************************************
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: [email protected]
ID comunicate.csic.es: @50852720l:matrix.csic.es
*******************************************************
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]