Hello, 
  
We've had a similar situation recently where OSDs would use way more memory 
than osd_memory_target and get OOM killed by the kernel. 
It was due to a kernel bug related to cgroups [1]. 
  
If num_cgroups below keeps increasing then you may hit this bug.
 
  
$ cat /proc/cgroups | grep -e subsys -e blkio | column -t 
   #subsys_name  hierarchy  num_cgroups  enabled 
   blkio         4          1099         1 
  
If you hit this bug, upgrading OSDs nodes kernels should get you through. If 
you can't access the Red Hat KB [1], let me know your current nodes kernel 
version and I'll check for you. 
  Regards,
Frédéric.  
 
  
[1] https://access.redhat.com/solutions/7014337     

-----Message original-----

De: huxiaoyu <huxia...@horebdata.cn>
à: ceph-users <ceph-users@ceph.io>
Envoyé: mercredi 10 janvier 2024 19:21 CET
Sujet : [ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

Dear Ceph folks, 

I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one 
with replication 3, and the other with EC 4+2. After around 400 days runing 
quietly and smoothly, recently the two clusters occured with similar problems: 
some of OSDs consume ca 18 GB while the memory target is setting at 2GB. 

What could wrong in the background? Does it mean any slow OSD memory leak 
issues with 14.2.22 which i do not know yet? 

I would be highly appreciated if some some provides any clues, ideas, comments 
...... 

best regards, 

Samuel 



huxia...@horebdata.cn 
_______________________________________________ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io   
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to