Hi everybody,
We have a CEPH Luminous cluster with 184 SSD OSDs. About 1 year ago we
noticed an abnormal growth in one of the cluster pools.
This pool is configured with a mirror feature to another CEPH cluster in
another datacenter. Below are the consumption of the two main pools.
#PRIMARY CLUSTER
[root@ceph01 ~]# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
659TiB 240TiB 419TiB 63.60 43.34M
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED
%USED MAX AVAIL OBJECTS DIRTY READ WRITE
RAW USED
images-dr 8 N/A N/A 1.24TiB
6.42 18.2TiB 163522 163.52k 42.6GiB 247MiB
3.73TiB
volumes 11 N/A N/A 59.1TiB
68.46 27.2TiB 18945218 18.95M 4.81GiB 4.16GiB
118TiB
volumes-dr 12 N/A N/A 143TiB
83.99 27.2TiB 22108005 22.11M 1.84GiB 918MiB
286TiB
To verify the actual consumption of images within the pools, we run the rbd
diff command within the pool and then add up all the results.
for j in $(rbd ls volumes)
do
i=$((i+1))
size=$(rbd diff volumes/$j | awk '{ SUM += $2 } END { print
SUM/1024/1024/1024 " GB" }')
echo "$j;$size" >> /var/lib/report-volumes/`date +%F`-volumes.txt
done
In the "volumes" pool, we got a value of 56,455.43GB (56TB) - a value close
to that shown by the ceph df command (59.1TiB).
for j in $(rbd ls volumes-dr)
do
i=$((i+1))
size=$(rbd diff volumes-dr/$j | awk '{ SUM += $2 } END { print
SUM/1024/1024/1024 " GB" }')
echo "$j;$size" >> /var/lib/report-volumes/`date +%F`-volumes.txt
done
In the "volumes-dr" pool, we got the value of 40,726.51 (38TB) - a much
lower value than the one shown by the ceph df command (143TiB)
Another feature of these two pools is that daily snapshots of all images
are taken and each image has a retention period (daily, weekly or monthly)
I thought this anomaly could be something related to the snapshots, but we
have already purged all the snapshots without significant reflections on
the pools.
I've already searched forums about unclaimed space, but haven't found
anything concrete.
As for the mirrored pool in the DR datacenter, the value shown is a little
more real with the one obtained with the rbd diff - 56.5TiB.
We use the "pool" type mirror and both the source and the destination
currently have the same amount of images: 223
#CLUSTER DR
[root@ceph-dr01 ~]# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
217TiB 97.6TiB 119TiB 54.98 16.73M
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED
%USED MAX AVAIL OBJECTS DIRTY READ WRITE
RAW USED
images-dr 1 N/A N/A 1.37TiB
6.89 18.5TiB 179953 179.95k 390MiB 198MiB
4.11TiB
volumes-dr 3 N/A N/A 56.5TiB
67.03 27.8TiB 16548170 16.55M 23.2GiB 59.0GiB
113TiB
Other infrastructure information:
4 virtualized monitors on CentOS 7.9.2009 (Core)
10 storage nodes (99 osds) with CentOS 7.9.2009 and Ceph 12.2.12
8 storage nodes (84 osds) with CentOS 7.9.2009 and Ceph 12.2.13
[root@ceph01]# ceph versions
{
"mon": {
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 4
},
"mgr": {
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 4
},
"osd": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 99,
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 84
},
"mds": {},
"rbd-mirror": {
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 1
},
"overall": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 99,
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 93
}
}
Another information is that apparently this anomaly started after the
inclusion of the last 4 storage nodes that had disks of different sizes -
3.8TB (the other 14 storage nodes are 4TB disks). But at the same time I
think if the disks were the problem then the other pool would also be
affected.
Has anyone ever faced such a situation?
João Victor Soares.
Binario Cloud
--
*Aviso: esta mensagem é destinada exclusivamente para a(s) pessoa(s) a quem
é dirigida, podendo conter informação confidencial e legalmente protegida.
Se você não for o destinatário, desde já fica notificado de abster-se a
divulgar, copiar, distribuir, examinar ou, de qualquer forma, utilizar a
informação contida nesta mensagem, por ser ilegal. Caso tenha recebido esta
mensagem por engano, pedimos que responda, informando o acontecido.*
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]