Hi, 

Following on from various woes, we see an odd and unhelpful behaviour with some 
OSDs on our cluster currently. 
A minority of OSDs seem to have runaway memory usage, rising to 10s of GB, 
whilst other OSDs on the same host behave sensibly. This started when we moved 
from Mimic -> Nautilus, as far as we can tell.

In best case, this causes some nodes to start swapping [and reduces their 
performance]. In worst case, it triggers the OOMkiller.

I have dumped the mempool for these OSDs, which shows that almost all the 
memory is in the buffer_anon pool.
The perf dump shows that the OSD is targetting the 4GB limit that's set for it, 
but for some reason is unable to reach this due to stuff in the priorty_cache 
(which seems to be mostly what is filling buffer_anon)

Can anyone advise on what we should do next?

(mempool dump and excerpt of perf dump at end of email).

Thanks for any help,

Sam Skipsey

MEMPOOL DUMP
{
    "mempool": {
        "by_pool": {
            "bloom_filter": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_alloc": {
                "items": 5629372,
                "bytes": 45034976
            },
            "bluestore_cache_data": {
                "items": 127,
                "bytes": 65675264
            },
            "bluestore_cache_onode": {
                "items": 8275,
                "bytes": 4634000
            },
            "bluestore_cache_other": {
                "items": 2967913,
                "bytes": 62469216
            },
            "bluestore_fsck": {
                "items": 0,
                "bytes": 0
            },
            "bluestore_txc": {
                "items": 145,
                "bytes": 100920
            },
            "bluestore_writing_deferred": {
                "items": 335,
                "bytes": 13160884
            },
            "bluestore_writing": {
                "items": 1406,
                "bytes": 5379120
            },
            "bluefs": {
                "items": 1105,
                "bytes": 24376
            },
            "buffer_anon": {
                "items": 13705143,
                "bytes": 40719040439
            },
            "buffer_meta": {
                "items": 6820143,
                "bytes": 600172584
            },
            "osd": {
                "items": 96,
                "bytes": 1138176
            },
            "osd_mapbl": {
                "items": 59,
                "bytes": 7022524
            },
            "osd_pglog": {
                "items": 491049,
                "bytes": 156701043
            },
            "osdmap": {
                "items": 107885,
                "bytes": 1723616
            },
            "osdmap_mapping": {
                "items": 0,
                "bytes": 0
            },
            "pgmap": {
                "items": 0,
                "bytes": 0
            },
            "mds_co": {
                "items": 0,
                "bytes": 0
            },
            "unittest_1": {
                "items": 0,
                "bytes": 0
            },
            "unittest_2": {
                "items": 0,
                "bytes": 0
            }
        },
        "total": {
            "items": 29733053,
            "bytes": 41682277138
        }
    }
}

PERF DUMP excerpt:

"prioritycache": {
        "target_bytes": 4294967296,
        "mapped_bytes": 38466584576,
        "unmapped_bytes": 425984,
        "heap_bytes": 38467010560,
        "cache_bytes": 134217728
    },
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to