Hi Igor,
Thanks for your reply.
I can verify, discard is disabled in our cluster:
10:03 root@node106b [fra]:~# ceph daemon osd.417 config show | grep discard
"bdev_async_discard": "false",
"bdev_enable_discard": "false",
[...]
So there must be something else causing the problems.
Thanks,
Denny
> Am 15.02.2019 um 12:41 schrieb Igor Fedotov <[email protected]>:
>
> Hi Denny,
>
> Do not remember exactly when discards appeared in BlueStore but they are
> disabled by default:
>
> See bdev_enable_discard option.
>
>
> Thanks,
>
> Igor
>
> On 2/15/2019 2:12 PM, Denny Kreische wrote:
>> Hi,
>>
>> two weeks ago we upgraded one of our ceph clusters from luminous 12.2.8 to
>> mimic 13.2.4, cluster is SSD-only, bluestore-only, 68 nodes, 408 OSDs.
>> somehow we see strange behaviour since then. Single OSDs seem to block for
>> around 5 minutes and this causes the whole cluster and connected
>> applications to hang. This happened 5 times during the last 10 days at
>> irregular times, it didn't happen before the upgrade.
>>
>> OSD log shows something like this (more log here:
>> https://pastebin.com/6BYam5r4):
>>
>> [...]
>> 2019-02-14 23:53:39.754 7f379a368700 -1 osd.417 340516 get_health_metrics
>> reporting 3 slow ops, oldest is osd_op(client.84226977.0:5112539976 0.dff
>> 0.1d783dff (undecoded) ondisk+read+known_if_redirected e340516)
>> 2019-02-14 23:53:40.706 7f379a368700 -1 osd.417 340516 get_health_metrics
>> reporting 7 slow ops, oldest is osd_op(client.84226977.0:5112539976 0.dff
>> 0.1d783dff (undecoded) ondisk+read+known_if_redirected e340516)
>> [...]
>>
>> In this example osd.417 seems to have a problem. I can see same log line in
>> other osd logs with placement groups related to osd.417.
>> I assume that all placement groups related to osd.417 are hanging or blocked
>> when osd.417 is blocked.
>>
>> How can I see in detail what might cause a certain OSD to stop working?
>>
>> The cluster consists of 3 different SSD vendors (micron, samsung, intel),
>> but only micron disks are affected until now. we earlier had problems with
>> micron SSDs with filestore (xfs), it was fstrim to cause single OSDs to
>> block for several minutes. we migrated to bluestore about a year ago. just
>> in case, is there any kind of ssd trim/discard happening in bluestore since
>> mimic?
>>
>> Thanks,
>> Denny
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Denny Kreische
IT System Ingenieur und Consultant
Am Teichdamm 20
04680 Colditz
Telefon: 034381 55125
Mobil: 0176 2115 1457
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com