I started my experimental 1-host/8-HDDs setup in 2018 with Luminous,
        and I read 
https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
        which had interested me in using Bluestore and rewriteable EC pools for 
RBD data.
        I have about 22 TiB or raw storage, and ceph df shows this :

--- RAW STORAGE ---
CLASS    SIZE    AVAIL    USED  RAW USED  %RAW USED
hdd    22 TiB  2.7 TiB  19 TiB    19 TiB      87.78
TOTAL  22 TiB  2.7 TiB  19 TiB    19 TiB      87.78

--- POOLS ---
POOL                   ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
jerasure21              1  256  9.0 TiB    2.32M   13 TiB  97.06    276 GiB
libvirt                 2  128  1.5 TiB  413.60k  4.5 TiB  91.77    140 GiB
rbd                     3   32  798 KiB        5  2.7 MiB      0    138 GiB
iso                     4   32  2.3 MiB       10  8.0 MiB      0    138 GiB
device_health_metrics   5    1   31 MiB        9   94 MiB   0.02    138 GiB

        If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and 2.7 TiB 
is shown at RAW STORAGE/AVAIL
        Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other 2.7-0.840 
=~ 1.86 TiB ???
        Or in different words, where are my (RAW STORAGE/RAW 
USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?

        As it does not seem I would get any more hosts for this setup,
        I am seriously thinking of bringing down this Ceph
        and setting up instead a Btrfs storing qcow2 images served over iSCSI
        which looks simpler to me for single-host situation.

Josh Baergen wrote:
Hey Wladimir,

I actually don't know where this is referenced in the docs, if anywhere. 
Googling around shows many people discovering this overhead the hard way on 
ceph-users.

I also don't know the rbd journaling mechanism in enough depth to comment on whether it could be causing this issue for you. Are you seeing a high allocated:stored ratio on your cluster?

Josh

On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel <[email protected] 
<mailto:[email protected]>> wrote:

    Dear Mr Baergen,

    thanks a lot for your very concise explanation,
    however I would like to learn more why default Bluestore alloc.size causes 
such a big storage overhead,
    and where in the Ceph docs it is explained how and what to watch for to 
avoid hitting this phenomenon again and again.
    I have a feeling this is what I get on my experimental Ceph setup with 
simplest JErasure 2+1 data pool.
    Could it be caused by journaled RBD writes to EC data-pool ?

    Josh Baergen wrote:
     > Hey Arkadiy,
     >
     > If the OSDs are on HDDs and were created with the default
     > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
     > effect data will be allocated from the pool in 640KiB chunks (64KiB *
     > (k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
     > which results in a ratio of 6.53:1 allocated:stored, which is pretty 
close
     > to the 7:1 observed.
     >
     > If my assumption about your configuration is correct, then the only way 
to
     > fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
     > OSDs, which will take a while...
     >
     > Josh
     >
     > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev <[email protected] 
<mailto:[email protected]>> wrote:
     >
     >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
     >> *3.5
     >> TiB *(7 times higher!)*:*
     >>
     >> root@ceph-01:~# ceph df
     >> --- RAW STORAGE ---
     >> CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
     >> hdd    196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
     >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
     >>
     >> --- POOLS ---
     >> POOL                       ID  PGS  STORED   OBJECTS  USED     %USED  
MAX
     >> AVAIL
     >> device_health_metrics       1    1   19 KiB       12   56 KiB      0
     >>   61 TiB
     >> .rgw.root                   2   32  2.6 KiB        6  1.1 MiB      0
     >>   61 TiB
     >> default.rgw.log             3   32  168 KiB      210   13 MiB      0
     >>   61 TiB
     >> default.rgw.control         4   32      0 B        8      0 B      0
     >>   61 TiB
     >> default.rgw.meta            5    8  4.8 KiB       11  1.9 MiB      0
     >>   61 TiB
     >> default.rgw.buckets.index   6    8  1.6 GiB      211  4.7 GiB      0
     >>   61 TiB
     >>
     >> default.rgw.buckets.data   10  128  501 GiB    5.36M  3.5 TiB   1.90
     >> 110 TiB
     >>
     >> The *default.rgw.buckets.data* pool is using erasure coding:
     >>
     >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
     >> crush-device-class=hdd
     >> crush-failure-domain=host
     >> crush-root=default
     >> jerasure-per-chunk-alignment=false
     >> k=6
     >> m=4
     >> plugin=jerasure
     >> technique=reed_sol_van
     >> w=8
     >>
     >> If anyone could help explain why it's using up 7 times more space, it 
would
     >> help a lot. Versioning is disabled. ceph version 15.2.13 (octopus 
stable).
     >>
     >> Sincerely,
     >> Ark.
     >> _______________________________________________
     >> ceph-users mailing list -- [email protected] 
<mailto:[email protected]>
     >> To unsubscribe send an email to [email protected] 
<mailto:[email protected]>
     >>
     > _______________________________________________
     > ceph-users mailing list -- [email protected] <mailto:[email protected]>
     > To unsubscribe send an email to [email protected] 
<mailto:[email protected]>
     >


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to