Hello Uwe,
as described in my mail we are running 4.13.0-39.
In conjunction with some later mails of this thread it seems that this problem
might related to os/microcode (spectre) updates.
I am planning a ceph/ubuntu upgrade in the next week because of various
reasons, let's see what happens.....
Regards Marc
Am 05.09.2018 um 20:24 schrieb Uwe Sauter:
> I'm also experiencing slow requests though I cannot point it to scrubbing.
>
> Which kernel do you run? Would you be able to test against the same kernel
> with Spectre/Meltdown mitigations disabled ("noibrs noibpb nopti
> nospectre_v2" as boot option)?
>
> Uwe
>
> Am 05.09.18 um 19:30 schrieb Brett Chancellor:
>> Marc,
>> As with you, this problem manifests itself only when the bluestore OSD is
>> involved in some form of deep scrub. Anybody have any insight on what might
>> be causing this?
>>
>> -Brett
>>
>> On Mon, Sep 3, 2018 at 4:13 AM, Marc Schöchlin <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi,
>>
>> we are also experiencing this type of behavior for some weeks on our not
>> so performance critical hdd pools.
>> We haven't spent so much time on this problem, because there are
>> currently more important tasks - but here are a few details:
>>
>> Running the following loop results in the following output:
>>
>> while true; do ceph health|grep -q HEALTH_OK || (date; ceph health
>> detail); sleep 2; done
>>
>> Sun Sep 2 20:59:47 CEST 2018
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>> 4 ops are blocked > 32.768 sec
>> osd.43 has blocked requests > 32.768 sec
>> Sun Sep 2 20:59:50 CEST 2018
>> HEALTH_WARN 4 slow requests are blocked > 32 sec
>> REQUEST_SLOW 4 slow requests are blocked > 32 sec
>> 4 ops are blocked > 32.768 sec
>> osd.43 has blocked requests > 32.768 sec
>> Sun Sep 2 20:59:52 CEST 2018
>> HEALTH_OK
>> Sun Sep 2 21:00:28 CEST 2018
>> HEALTH_WARN 1 slow requests are blocked > 32 sec
>> REQUEST_SLOW 1 slow requests are blocked > 32 sec
>> 1 ops are blocked > 32.768 sec
>> osd.41 has blocked requests > 32.768 sec
>> Sun Sep 2 21:00:31 CEST 2018
>> HEALTH_WARN 7 slow requests are blocked > 32 sec
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>> 7 ops are blocked > 32.768 sec
>> osds 35,41 have blocked requests > 32.768 sec
>> Sun Sep 2 21:00:33 CEST 2018
>> HEALTH_WARN 7 slow requests are blocked > 32 sec
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>> 7 ops are blocked > 32.768 sec
>> osds 35,51 have blocked requests > 32.768 sec
>> Sun Sep 2 21:00:35 CEST 2018
>> HEALTH_WARN 7 slow requests are blocked > 32 sec
>> REQUEST_SLOW 7 slow requests are blocked > 32 sec
>> 7 ops are blocked > 32.768 sec
>> osds 35,51 have blocked requests > 32.768 sec
>>
>> Our details:
>>
>> * system details:
>> * Ubuntu 16.04
>> * Kernel 4.13.0-39
>> * 30 * 8 TB Disk (SEAGATE/ST8000NM0075)
>> * 3* Dell Power Edge R730xd (Firmware 2.50.50.50)
>> * Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>> * 2*10GBITS SFP+ Network Adapters
>> * 192GB RAM
>> * Pools are using replication factor 3, 2MB object size,
>> 85% write load, 1700 write IOPS/sec
>> (ops mainly between 4k and 16k size), 300 read IOPS/sec
>> * we have the impression that this appears on deepscrub/scrub
>> activity.
>> * Ceph 12.2.5, we alread played with the osd settings OSD Settings
>> (our assumtion was that the problem is related to rocksdb
>> compaction)
>> bluestore cache kv max = 2147483648
>> bluestore cache kv ratio = 0.9
>> bluestore cache meta ratio = 0.1
>> bluestore cache size hdd = 10737418240
>> * this type problem only appears on hdd/bluestore osds, ssd/bluestore
>> osds did never experienced that problem
>> * the system is healthy, no swapping, no high load, no errors in dmesg
>>
>> I attached a log excerpt of osd.35 - probably this is useful for
>> investigating the problem is someone owns deeper bluestore knowledge.
>> (slow requests appeared on Sun Sep 2 21:00:35)
>>
>> Regards
>> Marc
>>
>>
>> Am 02.09.2018 um 15:50 schrieb Brett Chancellor:
>> > The warnings look like this. >
>> > 6 ops are blocked > 32.768 sec on osd.219
>> > 1 osds have slow requests
>> >
>> > On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza <[email protected]
>> <mailto:[email protected]>
>> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>> >
>> > On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor
>> > <[email protected] <mailto:[email protected]>
>> <mailto:[email protected]
>> <mailto:[email protected]>>>
>> > wrote:
>> > > Hi Cephers,
>> > > I am in the process of upgrading a cluster from Filestore to
>> > bluestore,
>> > > but I'm concerned about frequent warnings popping up against
>> the new
>> > > bluestore devices. I'm frequently seeing messages like this,
>> > although the
>> > > specific osd changes, it's always one of the few hosts I've
>> > converted to
>> > > bluestore.
>> > >
>> > > 6 ops are blocked > 32.768 sec on osd.219
>> > > 1 osds have slow requests
>> > >
>> > > I'm running 12.2.4, have any of you seen similar issues? It
>> > seems as though
>> > > these messages pop up more frequently when one of the bluestore
>> > pgs is
>> > > involved in a scrub. I'll include my bluestore creation process
>> > below, in
>> > > case that might cause an issue. (sdb, sdc, sdd are SATA, sde and
>> > sdf are
>> > > SSD)
>> >
>> > Would be useful to include what those warnings say. The
>> ceph-volume
>> > commands look OK to me
>> >
>> > >
>> > >
>> > > ## Process used to create osds
>> > > sudo ceph-disk zap /dev/sdb /dev/sdc /dev/sdd /dev/sdd /dev/sde
>> > /dev/sdf
>> > > sudo ceph-volume lvm zap /dev/sdb
>> > > sudo ceph-volume lvm zap /dev/sdc
>> > > sudo ceph-volume lvm zap /dev/sdd
>> > > sudo ceph-volume lvm zap /dev/sde
>> > > sudo ceph-volume lvm zap /dev/sdf
>> > > sudo sgdisk -n 0:2048:+133GiB -t 0:FFFF -c 1:"ceph block.db sdb"
>> > /dev/sdf
>> > > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 2:"ceph block.db sdc"
>> > /dev/sdf
>> > > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 3:"ceph block.db sdd"
>> > /dev/sdf
>> > > sudo sgdisk -n 0:0:+133GiB -t 0:FFFF -c 4:"ceph block.db sde"
>> > /dev/sdf
>> > > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>> > --data
>> > > /dev/sdb --block.db /dev/sdf1
>> > > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>> > --data
>> > > /dev/sdc --block.db /dev/sdf2
>> > > sudo ceph-volume lvm create --bluestore --crush-device-class hdd
>> > --data
>> > > /dev/sdd --block.db /dev/sdf3
>> > >
>> > >
>> > > _______________________________________________
>> > > ceph-users mailing list
>> > > [email protected] <mailto:[email protected]>
>> <mailto:[email protected]
>> <mailto:[email protected]>>
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> > >
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected] <mailto:[email protected]>
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com