Hi Adam,
please see my comments inline,
Thanks,
Igor
On 05.05.2025 18:17, Adam Prycki wrote:
Hi Igor,
It looks like you're right.
ceph tell osd.1750 status
{
"cluster_fsid": "bec60cda-a306-11ed-abd9-75488d4e8f4a",
"osd_fsid": "388906f8-1df8-45a2-9895-067ee2e0c055",
"whoami": 1750,
"state": "active",
"maps": "[1502335~261370]",
"oldest_map": "1502335",
"newest_map": "1763704",
"cluster_osdmap_trim_lower_bound": 1502335,
"num_pgs": 0
}
One of the OSDs I've checked has about 261K maps
Could this cause bluestore to grow to ~380GiB?
You better check objects size/count in meta pool using
ceph-objectstore-tool and estimate the total numbers:
ceph-objectstore-tool --data-path <path-to-osd> --op meta-list >
meta_list; cat meta_list | wc
ceph-objectstore-tool --data-path <path-to-osd> --pgid meta <oid> dump |
grep size
The latter command can obtain onode size for a given object - just use a
few oids corresponding to specific onode types (osdmap* and inc_osdmap*
are of particular interest, other types to be checked if they are in
bulky counts) from meta_list file.
What settings could affect number of mapps stored on OSD.
Only think that comes to mind is mon_min_osdmap_epochs which I
configured to 2000 a while ago.
osdmap-s should be trimmed automatically in a healthy cluster. Perhaps
an ongoing rebalancing prevents or some other issue from that. The first
question would be how osdmap epochs evolve? Is oldest_map increasing? Is
the delta decreasing?
We are also running long rebalancing on this cluster. At the beginning
of the year we've expended this cluster and we are still waiting for
HDD pool to finish rebalancing. We have only 2% of data left to
rebalance. It's taking so long because I wanted to stay with mclock
scheduler at first.
Rebalancing shouldn't affect SSD pool directly, but it keeps hdd pools
from being 100% active+clean
Evacuating all PGs from OSD doesn't lower it's usage.
For example
1814 ssd 0 1.00000 447 GiB 382 GiB 381 GiB 351 MiB
1.0 GiB 65 GiB 85.44 1.00 0 up
I assume it's bluestore and not PG/Crush related issue.
It's the issue with osdmap not being properly trimmed. It's unrelated to
BlueStore.
Best regards
Adam
W dniu 5.05.2025 o 16:20, Igor Fedotov pisze:
Hi Adam,
I recall a case where pretty high OSD utilization was caused by no
trimming for osdmaps.
You might want to check the output of 'ceph tell osd.N status'
command and compare oldest_map vs. newest_map values.
Having a delta in hundreds of thousands of epochs would confirm the
case.
Thanks,
Igor
On 05.05.2025 16:41, Adam Prycki wrote:
Sorry guys, I've forgotten to add attachments to my previous email.
In attachments:
ceph osd crush rule dump
ceph osd df tree
Best regards
Adam Prycki
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io