I'll pitch in my personal expirience.

When single OSD in a pool becomes full(95% used), then all client IO writes
to this pool must stop, even if other OSDs are almost free. This is done
for the purpose of data intergity. [1]
To avoid this you need to balance your failure domains.
For example, assuming replicated pool size = 2, if one of your failure
domains has a weight of 10, and the other has a weight of 3 - you're
screwed. CEPH has to have a copy in both failure domains, and when second
failure domain nears its capacity, first will still have more than 70% free
storage.
It's easy to calculate and predict cluster storage capacity when all your
failure domains are of the same weight, and their number is even to your
replication size, for example if your size = 3, and you have 3,6,9,12, etc,
failure domains of the same weight. It becomes not so easy when your
failure domains are of different weight, and an odd number to your
replicated pool size. This may be even more compiicated with EC pools, but
I don't use them, so no expirience there.
So what I learned is that you should build your cluster evenly, without
heavy imbalance in weights(and IOPS for that matter, if you don't want to
get slow requests), or you will regularly come to a situation where a
single OSD is in near_full status, while cluster reports terabytes of free
storage.

[1]
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#no-free-drive-space

2018-03-05 11:15 GMT+03:00 Jakub Jaszewski <[email protected]>:

> One full OSD has caused that all pools got full. Can anyone help me
> understand this ?
>
> During ongoing PGs backfilling I see that MAX AVAIL values are changing
> when USED values are constant.
>
>
> GLOBAL:
>     SIZE     AVAIL     RAW USED     %RAW USED
>     425T      145T         279T         65.70
> POOLS:
>     NAME                           ID     USED       %USED     MAX AVAIL
>    OBJECTS
>     volumes                        3      41011G     91.14         3987G
>    10520026
>     default.rgw.buckets.data       20       105T     93.11         7974G
>    28484000
>
>
>
>
> GLOBAL:
>     SIZE     AVAIL     RAW USED     %RAW USED
>     425T      146T         279T         65.66
> POOLS:
>     NAME                           ID     USED       %USED     MAX AVAIL
>    OBJECTS
>     volumes                        3      41013G     88.66         5246G
>    10520539
>     default.rgw.buckets.data       20       105T     91.13        10492G
>    28484000
>
>
> From what I can read in docs The MAX AVAIL value is a complicated function
> of the replication or erasure code used, the CRUSH rule that maps storage
> to devices, the utilization of those devices, and the configured
> mon_osd_full_ratio.
>
> Any clue what more I can do to make better use of available raw storage ?
> Increase number of PGs for better balanced OSDs utilization ?
>
> Thanks
> Jakub
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to