Re: [ceph-users] Bluestore Runaway Memory

Mark Nelson Thu, 18 Jul 2019 16:42:52 -0700

Hi Brett,

Can you enable debug_bluestore = 5 and debug_prioritycache = 5 on one ofthe OSDs that's showing the behavior? You'll want to look in the logsfor lines that look like this:

2019-07-18T19:34:42.587-0400 7f4048b8d700 5 prioritycache tune_memorytarget: 4294967296 mapped: 4260962304 unmapped: 856948736 heap:5117911040 old mem: 2845415707 new mem: 28454157072019-07-18T19:34:33.527-0400 7f4048b8d700 5bluestore.MempoolThread(0x55a6d330ead0) _resize_shards cache_size:2845415707 kv_alloc: 1241513984 kv_used: 874833889 meta_alloc:1258291200 meta_used: 889040246 data_alloc: 318767104 data_used: 0

The first line will tell you what your memory target is set to, how muchmemory is currently mapped, how much is unmapped (ie what's been freedbut the kernel hasn't reclaimed), the total heap size, and the old andnew aggregate size for all of bluestores caches. The second line alsotells you the aggregate cache size, and then how much space is beingallocated and used for the kv, meta, and data caches. If there's a leaksomewhere in the OSD or bluestore the autotuner will shrink the cacheway down but eventually won't be able to contain it and eventually yourprocess will start growing beyond the target size despite having a tinyamount of bluestore cache. If it's something else like a huge amount offreed memory not being reclaimed by the kernel, you'll see large amountof unmapped memory and a big heap size despite the mapped memory stayingnear the target. If it's a bug in the autotuner, we might see themapped memory greatly exceeding the target.



Mark


On 7/18/19 4:02 PM, Brett Kelly wrote:

Hello,
We have a Nautilus cluster exhibiting what looks like this bug:https://tracker.ceph.com/issues/39618
No matter what is set as the osd_memory_target (currently 2147483648), each OSD process will surpass this value and peak around ~4.0GBthen eventually start using swap. Cluster stays stable for about aweek and then starts running into OOM issues, kills off OSDs andrequires a reboot of each node to get back to a stable state.
Has anyone run into similar/workarounds ?

Ceph version: 14.2.1, RGW Clients

CentOS Linux release 7.6.1810 (Core)

Kernel: 3.10.0-957.12.1.el7.x86_64

256GB RAM per OSD node, 60 OSD's in each node.


Thanks,

--
Brett Kelly


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore Runaway Memory

Reply via email to