Hi all,

on a mimic 13.2.8 cluster I observe a gradual increase of memory usage by OSD 
daemons, in particular, under heavy load. For our spinners I use 
osd_memory_target=2G. The daemons overrun the 2G in virt size rather quickly 
and grow to something like 4G virtual. The real memory consumption stays more 
or less around the 2G of the target. There are some overshoots, but these go 
down again during periods with less load.

What I observe now is that the actual memory consumption slowly grows and OSDs 
start using more than 2G virtual memory. I see this as slowly growing swap 
usage despite having more RAM available (swappiness=10). This indicates 
allocated but unused memory or memory not accessed for a long time, usually a 
leak. Here some heap stats:

Before restart:
osd.101 tcmalloc heap stats:------------------------------------------------
MALLOC:     3438940768 ( 3279.6 MiB) Bytes in use by application
MALLOC: +      5611520 (    5.4 MiB) Bytes in page heap freelist
MALLOC: +    257307352 (  245.4 MiB) Bytes in central cache freelist
MALLOC: +       357376 (    0.3 MiB) Bytes in transfer cache freelist
MALLOC: +      6727368 (    6.4 MiB) Bytes in thread cache freelists
MALLOC: +     25559040 (   24.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   3734503424 ( 3561.5 MiB) Actual memory used (physical + swap)
MALLOC: +    575946752 (  549.3 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   4310450176 ( 4110.8 MiB) Virtual address space used
MALLOC:
MALLOC:         382884              Spans in use
MALLOC:             35              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
# ceph daemon osd.101 dump_mempools
{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_alloc": {
                "items": 4691828,
                "bytes": 37534624
            },
            "bluestore_cache_data": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_cache_onode": {
                "items": 51,
                "bytes": 28968
            },
            "bluestore_cache_other": {
                "items": 5761276,
                "bytes": 46292425
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 67,
                "bytes": 46096
            },
            "bluestore_writing_deferred": {
                "items": 208,
                "bytes": 26037057
            },
            "bluestore_writing": {
                "items": 52,
                "bytes": 6789398
            },
            "bluefs": {
                "items": 9478,
                "bytes": 183720
            },
            "buffer_anon": {
                "items": 291450,
                "bytes": 28093473
            },
            "buffer_meta": {
                "items": 546,
                "bytes": 34944
            },
            "osd": {
                "items": 98,
                "bytes": 1139152
            },
            "osd_mapbl": {
                "items": 78,
                "bytes": 8204276
            },
            "osd_pglog": {
                "items": 341944,
                "bytes": 120607952
            },
            "osdmap": {
                "items": 10687217,
                "bytes": 186830528
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 21784293,
            "bytes": 461822613
        }
    }
}

Right after restart + health_ok:
osd.101 tcmalloc heap stats:------------------------------------------------
MALLOC:     1173996280 ( 1119.6 MiB) Bytes in use by application
MALLOC: +      3727360 (    3.6 MiB) Bytes in page heap freelist
MALLOC: +     25493688 (   24.3 MiB) Bytes in central cache freelist
MALLOC: +     17101824 (   16.3 MiB) Bytes in transfer cache freelist
MALLOC: +     20301904 (   19.4 MiB) Bytes in thread cache freelists
MALLOC: +      5242880 (    5.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =   1245863936 ( 1188.1 MiB) Actual memory used (physical + swap)
MALLOC: +     20488192 (   19.5 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =   1266352128 ( 1207.7 MiB) Virtual address space used
MALLOC:
MALLOC:          54160              Spans in use
MALLOC:             33              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------

Am I looking at a memory leak here or are these heap stats expected?

I don't mind the swap usage, it doesn't have impact. I'm just wondering if I 
need to restart OSDs regularly. The "leakage" above occurred within only 2 
months.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to