Thank You for the quick response Dyweni!

We are using FileStore as this cluster is upgraded from
Hammer-->Jewel-->Luminous 12.2.8. 16x2TB HDD per node for all nodes. R730xd
has 128GB and R740xd has 96GB of RAM. Everything else is the same.

Thanks,
Pardhiv Karri

On Fri, Dec 21, 2018 at 1:43 PM Dyweni - Ceph-Users <6exbab4fy...@dyweni.com>
wrote:

> Hi,
>
>
> You could be running out of memory due to the default Bluestore cache
> sizes.
>
>
> How many disks/OSDs in the R730xd versus the R740xd?  How much memory in
> each server type?  How many are HDD versus SSD?  Are you running Bluestore?
>
>
> OSD's in Luminous, which run Bluestore, allocate memory to use as a
> "cache", since the kernel-provided page-cache is not available to
> Bluestore.  Bluestore, by default, will use 1GB of memory for each HDD, and
> 3GB of memory for each SSD.  OSD's do not allocate all that memory up
> front, but grow into it as it is used.  This cache is in addition to any
> other memory the OSD uses.
>
>
> Check out the bluestore_cache_* values (these are specified in bytes) in
> the manual cache sizing section of the docs (
> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/).
>  Note that the automatic cache sizing feature wasn't added until 12.2.9.
>
>
>
> As an example, I have OSD's running on 32bit/armhf nodes.  These nodes
> have 2GB of memory.  I run 1 Bluestore OSD on each node.  In my ceph.conf
> file, I have 'bluestore cache size = 536870912' and 'bluestore cache kv max
> = 268435456'.  I see aprox 1.35-1.4 GB used by each OSD.
>
>
>
>
> On 2018-12-21 15:19, Pardhiv Karri wrote:
>
> Hi,
>
> We have a luminous cluster which was upgraded from Hammer --> Jewel -->
> Luminous 12.2.8 recently. Post upgrade we are seeing issue with a few nodes
> where they are running out of memory and dying. In the logs we are seeing
> OOM killer. We don't have this issue before upgrade. The only difference is
> the nodes without any issue are R730xd and the ones with the memory leak
> are R740xd. The hardware vendor don't see anything wrong with the hardware.
> From Ceph end we are not seeing any issue when it comes to running the
> cluster, only issue is with memory leak. Right now we are actively
> rebooting the nodes in timely manner to avoid crashes. One R740xd node we
> set all the OSDs to 0.0 and there is no memory leak there. Any pointers to
> fix the issue would be helpful.
>
> Thanks,
> *Pardhiv Karri*
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to