Quoting Stefan Kooman ([email protected]):
> Hi,
> 
> Like I said in an earlier mail to this list, we re-balanced ~ 60% of the
> CephFS metadata pool to NVMe backed devices. Roughly 422 M objects (1.2
> Billion replicated). We have 512 PGs allocated to them. While
> rebalancing we suffered from quite a few SLOW_OPS. Memory, CPU and
> device IOPS capacity were not a limiting factor as far as we can see (plenty 
> of
> them available ... nowhere near max capacity). We saw quite a few
> slow ops with the following events:
> 
>         "time": "2019-12-19 09:41:02.712010",
>                         "event": "reached_pg"
>                     },
>                     {
>                         "time": "2019-12-19 09:41:02.712014",
>                         "event": "waiting for rw locks"
>                     },
>                     {
>                         "time": "2019-12-19 09:41:02.881939",
>                         "event": "reached_pg"
> 
> ... and this repeated 100's of times taking ~ 30 seconds to complete
> 
> Does this indicate PG lock contention?
> 
> If so ... would we need to provide more PGs to the metadata pool to avoid 
> this?
> 
> The metadata pool is only ~ 166 MiB big ... but with loads of OMAPs ...
> 
> Most advice on PG planning is concerned with the _amount_ of data ... but the
> metadata pool (and this might also be true for RGW index pools) seem to be a
> special case.

This does seem to be the case. We moved the data to a subset of the
cluster which turned out not to be a good idea. The OSDs suffered badly
from this. Spreading the workload accross all OSDs (reverting change)
fixed the issues. If you have *lots* of small files and / or directories
in your cluster ... scale your metadata PGs accordingly.

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to