Hi,

I forgot to say that maybe the Diff is lower than real (8Mb), because the
memory usage was still high and i've prepared a new configuration with
lower limit (5Mb). I've not reloaded the daemons for now, but maybe the
configuration was loaded again today and that's the reason why is using
less than 1Gb of RAM just now. Of course I've not rebooted the machine, but
maybe if the daemon was killed for high memory usage then the new
configuration is loaded now.

Greetings!


2018-07-23 21:07 GMT+02:00 Daniel Carrasco <[email protected]>:

> Thanks!,
>
> It's true that I've seen a continuous memory growth, but I've not thought
> in a memory leak. I don't remember exactly how many hours were neccesary to
> fill the memory, but I calculate that were about 14h.
>
> With the new configuration looks like memory grows slowly and when it
> reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and
> down again to less than 1Gb grown again to 5-6Gb slowly.
>
> Just today I don't know why and how, because I've not changed anything on
> the ceph cluster, but the memory has down to less than 1 Gb and still there
> 8 hours later. I've only deployed a git repository with some changes.
>
> I've some nodes on version 12.2.5 because I've detected this problem and I
> didn't know if was for the latest version, so I've stopped the update. The
> one that is the active MDS is on latest version (12.2.7), and I've
> programmed an update for the rest of nodes the thursday.
>
> A graphic of the memory usage of latest days with that configuration:
> https://imgur.com/a/uSsvBi4
>
> I haven't info about when the problem was worst (512MB of MDS memory limit
> and 15-16Gb of usage), because memory usage was not logged. I've only a
> heap stats from that were dumped when the daemon was in progress to fill
> the memory:
>
> # ceph tell mds.kavehome-mgto-pro-fs01  heap stats
> 2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset
> on 10.22.0.168:6800/1129848128
> 2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset
> on 10.22.0.168:6800/1129848128
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:------------------------
> ------------------------
> MALLOC:     9982980144 ( 9520.5 MiB) Bytes in use by application
> MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
> MALLOC: +    172148208 (  164.2 MiB) Bytes in central cache freelist
> MALLOC: +     19031168 (   18.1 MiB) Bytes in transfer cache freelist
> MALLOC: +     23987552 (   22.9 MiB) Bytes in thread cache freelists
> MALLOC: +     20869280 (   19.9 MiB) Bytes in malloc metadata
> MALLOC:   ------------
> MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
> MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   ------------
> MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
> MALLOC:
> MALLOC:          63875              Spans in use
> MALLOC:             16              Thread heaps in use
> MALLOC:           8192              Tcmalloc page size
> ------------------------------------------------
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
>
>
> Here's the Diff:
> ------------------------------------------------------------
> --------------------------------------------------------
> {
>     "diff": {
>         "current": {
>             "admin_socket": "/var/run/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.asok",
>             "auth_client_required": "cephx",
>             "bluestore_cache_size_hdd": "80530636",
>             "bluestore_cache_size_ssd": "80530636",
>             "err_to_stderr": "true",
>             "fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8",
>             "internal_safe_to_start_threads": "true",
>             "keyring": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/
> keyring",
>             "log_file": "/var/log/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.log",
>             "log_max_recent": "10000",
>             "log_to_stderr": "false",
>             "mds_cache_memory_limit": "53687091",
>             "mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01",
>             "mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01",
>             "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log
> cluster=/var/log/ceph/ceph.log",
>             "mon_data": "/var/lib/ceph/mon/ceph-kavehome-mgto-pro-fs01",
>             "mon_debug_dump_location": "/var/log/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.tdump",
>             "mon_host": "10.22.0.168,10.22.0.140,10.22.0.127",
>             "mon_initial_members": "kavehome-mgto-pro-fs01,
> kavehome-mgto-pro-fs02, kavehome-mgto-pro-fs03",
>             "osd_data": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01",
>             "osd_journal": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01/
> journal",
>             "public_addr": "10.22.0.168:0/0",
>             "public_network": "10.22.0.0/24",
>             "rgw_data": "/var/lib/ceph/radosgw/ceph-
> kavehome-mgto-pro-fs01",
>             "setgroup": "ceph",
>             "setuser": "ceph"
>         },
>         "defaults": {
>             "admin_socket": "",
>             "auth_client_required": "cephx, none",
>             "bluestore_cache_size_hdd": "1073741824",
>             "bluestore_cache_size_ssd": "3221225472",
>             "err_to_stderr": "false",
>             "fsid": "00000000-0000-0000-0000-000000000000",
>             "internal_safe_to_start_threads": "false",
>             "keyring": "/etc/ceph/$cluster.$name.
> keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/
> etc/ceph/keyring.bin,",
>             "log_file": "",
>             "log_max_recent": "500",
>             "log_to_stderr": "true",
>             "mds_cache_memory_limit": "1073741824",
>             "mds_data": "/var/lib/ceph/mds/$cluster-$id",
>             "mgr_data": "/var/lib/ceph/mgr/$cluster-$id",
>             "mon_cluster_log_file": 
> "default=/var/log/ceph/$cluster.$channel.log
> cluster=/var/log/ceph/$cluster.log",
>             "mon_data": "/var/lib/ceph/mon/$cluster-$id",
>             "mon_debug_dump_location": "/var/log/ceph/$cluster-$name.
> tdump",
>             "mon_host": "",
>             "mon_initial_members": "",
>             "osd_data": "/var/lib/ceph/osd/$cluster-$id",
>             "osd_journal": "/var/lib/ceph/osd/$cluster-$id/journal",
>             "public_addr": "-",
>             "public_network": "",
>             "rgw_data": "/var/lib/ceph/radosgw/$cluster-$id",
>             "setgroup": "",
>             "setuser": ""
>         }
>     },
>     "unknown": []
> }
> ------------------------------------------------------------
> ----------------------------------------------
>
>
>
> Perf Dump
> ------------------------------------------------------------
> ---------------------------------------------
> {
>     "AsyncMessenger::Worker-0": {
>         "msgr_recv_messages": 1350895,
>         "msgr_send_messages": 1593759,
>         "msgr_recv_bytes": 301786293,
>         "msgr_send_bytes": 341807191,
>         "msgr_created_connections": 148,
>         "msgr_active_connections": 45,
>         "msgr_running_total_time": 119.217157290,
>         "msgr_running_send_time": 39.714493374,
>         "msgr_running_recv_time": 127.455260807,
>         "msgr_running_fast_dispatch_time": 0.117634930
>     },
>     "AsyncMessenger::Worker-1": {
>         "msgr_recv_messages": 2996114,
>         "msgr_send_messages": 3113274,
>         "msgr_recv_bytes": 804875332,
>         "msgr_send_bytes": 1231962873,
>         "msgr_created_connections": 151,
>         "msgr_active_connections": 48,
>         "msgr_running_total_time": 248.962533700,
>         "msgr_running_send_time": 83.497214869,
>         "msgr_running_recv_time": 547.534653813,
>         "msgr_running_fast_dispatch_time": 0.125151678
>     },
>     "AsyncMessenger::Worker-2": {
>         "msgr_recv_messages": 1793419,
>         "msgr_send_messages": 2117240,
>         "msgr_recv_bytes": 1425674729,
>         "msgr_send_bytes": 871324466,
>         "msgr_created_connections": 325,
>         "msgr_active_connections": 54,
>         "msgr_running_total_time": 160.001753142,
>         "msgr_running_send_time": 49.679463024,
>         "msgr_running_recv_time": 205.535692064,
>         "msgr_running_fast_dispatch_time": 4.350479591
>     },
>     "finisher-PurgeQueue": {
>         "queue_len": 0,
>         "complete_latency": {
>             "avgcount": 755,
>             "sum": 0.022316252,
>             "avgtime": 0.000029557
>         }
>     },
>     "mds": {
>         "request": 4942944,
>         "reply": 489638,
>         "reply_latency": {
>             "avgcount": 489638,
>             "sum": 771.955019623,
>             "avgtime": 0.001576583
>         },
>         "forward": 4453296,
>         "dir_fetch": 101036,
>         "dir_commit": 3,
>         "dir_split": 0,
>         "dir_merge": 0,
>         "inode_max": 2147483647,
>         "inodes": 505,
>         "inodes_top": 96,
>         "inodes_bottom": 398,
>         "inodes_pin_tail": 11,
>         "inodes_pinned": 367,
>         "inodes_expired": 1556356,
>         "inodes_with_caps": 325,
>         "caps": 1192,
>         "subtrees": 16,
>         "traverse": 4956673,
>         "traverse_hit": 496867,
>         "traverse_forward": 4450841,
>         "traverse_discover": 166,
>         "traverse_dir_fetch": 1657,
>         "traverse_remote_ino": 0,
>         "traverse_lock": 19,
>         "load_cent": 494278118,
>         "q": 0,
>         "exported": 1187,
>         "exported_inodes": 664127,
>         "imported": 947,
>         "imported_inodes": 76628
>     },
>     "mds_cache": {
>         "num_strays": 0,
>         "num_strays_delayed": 0,
>         "num_strays_enqueuing": 0,
>         "strays_created": 124,
>         "strays_enqueued": 124,
>         "strays_reintegrated": 0,
>         "strays_migrated": 0,
>         "num_recovering_processing": 0,
>         "num_recovering_enqueued": 0,
>         "num_recovering_prioritized": 0,
>         "recovery_started": 0,
>         "recovery_completed": 0,
>         "ireq_enqueue_scrub": 0,
>         "ireq_exportdir": 1189,
>         "ireq_flush": 0,
>         "ireq_fragmentdir": 0,
>         "ireq_fragstats": 0,
>         "ireq_inodestats": 0
>     },
>     "mds_log": {
>         "evadd": 125666,
>         "evex": 116984,
>         "evtrm": 116984,
>         "ev": 117582,
>         "evexg": 0,
>         "evexd": 933,
>         "segadd": 138,
>         "segex": 138,
>         "segtrm": 138,
>         "seg": 129,
>         "segexg": 0,
>         "segexd": 1,
>         "expos": 25715287703,
>         "wrpos": 25862332030,
>         "rdpos": 25663431097,
>         "jlat": {
>             "avgcount": 23473,
>             "sum": 98.111299299,
>             "avgtime": 0.004179751
>         },
>         "replayed": 108900
>     },
>     "mds_mem": {
>         "ino": 507,
>         "ino+": 1579334,
>         "ino-": 1578827,
>         "dir": 312,
>         "dir+": 101932,
>         "dir-": 101620,
>         "dn": 529,
>         "dn+": 1580751,
>         "dn-": 1580222,
>         "cap": 1192,
>         "cap+": 1825843,
>         "cap-": 1824651,
>         "rss": 258840,
>         "heap": 313880,
>         "buf": 0
>     },
>     "mds_server": {
>         "dispatch_client_request": 5081829,
>         "dispatch_server_request": 540,
>         "handle_client_request": 4942944,
>         "handle_client_session": 233505,
>         "handle_slave_request": 846,
>         "req_create": 128,
>         "req_getattr": 38805,
>         "req_getfilelock": 0,
>         "req_link": 0,
>         "req_lookup": 242216,
>         "req_lookuphash": 0,
>         "req_lookupino": 0,
>         "req_lookupname": 2,
>         "req_lookupparent": 0,
>         "req_lookupsnap": 0,
>         "req_lssnap": 0,
>         "req_mkdir": 0,
>         "req_mknod": 0,
>         "req_mksnap": 0,
>         "req_open": 2155,
>         "req_readdir": 206315,
>         "req_rename": 21,
>         "req_renamesnap": 0,
>         "req_rmdir": 0,
>         "req_rmsnap": 0,
>         "req_rmxattr": 0,
>         "req_setattr": 2,
>         "req_setdirlayout": 0,
>         "req_setfilelock": 0,
>         "req_setlayout": 0,
>         "req_setxattr": 0,
>         "req_symlink": 0,
>         "req_unlink": 122
>     },
>     "mds_sessions": {
>         "session_count": 10,
>         "session_add": 128,
>         "session_remove": 118
>     },
>     "objecter": {
>         "op_active": 0,
>         "op_laggy": 0,
>         "op_send": 136767,
>         "op_send_bytes": 202196534,
>         "op_resend": 0,
>         "op_reply": 136767,
>         "op": 136767,
>         "op_r": 101193,
>         "op_w": 35574,
>         "op_rmw": 0,
>         "op_pg": 0,
>         "osdop_stat": 5,
>         "osdop_create": 0,
>         "osdop_read": 150,
>         "osdop_write": 23587,
>         "osdop_writefull": 11750,
>         "osdop_writesame": 0,
>         "osdop_append": 0,
>         "osdop_zero": 2,
>         "osdop_truncate": 0,
>         "osdop_delete": 228,
>         "osdop_mapext": 0,
>         "osdop_sparse_read": 0,
>         "osdop_clonerange": 0,
>         "osdop_getxattr": 100784,
>         "osdop_setxattr": 0,
>         "osdop_cmpxattr": 0,
>         "osdop_rmxattr": 0,
>         "osdop_resetxattrs": 0,
>         "osdop_tmap_up": 0,
>         "osdop_tmap_put": 0,
>         "osdop_tmap_get": 0,
>         "osdop_call": 0,
>         "osdop_watch": 0,
>         "osdop_notify": 0,
>         "osdop_src_cmpxattr": 0,
>         "osdop_pgls": 0,
>         "osdop_pgls_filter": 0,
>         "osdop_other": 3,
>         "linger_active": 0,
>         "linger_send": 0,
>         "linger_resend": 0,
>         "linger_ping": 0,
>         "poolop_active": 0,
>         "poolop_send": 0,
>         "poolop_resend": 0,
>         "poolstat_active": 0,
>         "poolstat_send": 0,
>         "poolstat_resend": 0,
>         "statfs_active": 0,
>         "statfs_send": 0,
>         "statfs_resend": 0,
>         "command_active": 0,
>         "command_send": 0,
>         "command_resend": 0,
>         "map_epoch": 468,
>         "map_full": 0,
>         "map_inc": 39,
>         "osd_sessions": 3,
>         "osd_session_open": 479,
>         "osd_session_close": 476,
>         "osd_laggy": 0,
>         "omap_wr": 7,
>         "omap_rd": 202074,
>         "omap_del": 1
>     },
>     "purge_queue": {
>         "pq_executing_ops": 0,
>         "pq_executing": 0,
>         "pq_executed": 124
>     },
>     "throttle-msgr_dispatch_throttler-mds": {
>         "val": 0,
>         "max": 104857600,
>         "get_started": 0,
>         "get": 6140428,
>         "get_sum": 2077944682,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 6140428,
>         "take": 0,
>         "take_sum": 0,
>         "put": 6140428,
>         "put_sum": 2077944682,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-objecter_bytes": {
>         "val": 0,
>         "max": 104857600,
>         "get_started": 0,
>         "get": 0,
>         "get_sum": 0,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 136767,
>         "take_sum": 339484250,
>         "put": 136523,
>         "put_sum": 339484250,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-objecter_ops": {
>         "val": 0,
>         "max": 1024,
>         "get_started": 0,
>         "get": 0,
>         "get_sum": 0,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 136767,
>         "take_sum": 136767,
>         "put": 136767,
>         "put_sum": 136767,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-write_buf_throttle": {
>         "val": 0,
>         "max": 3758096384,
>         "get_started": 0,
>         "get": 124,
>         "get_sum": 11532,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 124,
>         "take": 0,
>         "take_sum": 0,
>         "put": 109,
>         "put_sum": 11532,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-write_buf_throttle-0x55faf5ba4220": {
>         "val": 0,
>         "max": 3758096384,
>         "get_started": 0,
>         "get": 125666,
>         "get_sum": 198900816,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 125666,
>         "take": 0,
>         "take_sum": 0,
>         "put": 23473,
>         "put_sum": 198900816,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     }
> }
> ------------------------------------------------------------
> ----------------------------------
>
>
>
> dump_mempools
> ------------------------------------------------------------
> ----------------------------------
> {
>     "bloom_filter": {
>         "items": 120,
>         "bytes": 120
>     },
>     "bluestore_alloc": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_cache_data": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_cache_onode": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_cache_other": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_fsck": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_txc": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_writing_deferred": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluestore_writing": {
>         "items": 0,
>         "bytes": 0
>     },
>     "bluefs": {
>         "items": 0,
>         "bytes": 0
>     },
>     "buffer_anon": {
>         "items": 96401,
>         "bytes": 16010198
>     },
>     "buffer_meta": {
>         "items": 1,
>         "bytes": 88
>     },
>     "osd": {
>         "items": 0,
>         "bytes": 0
>     },
>     "osd_mapbl": {
>         "items": 0,
>         "bytes": 0
>     },
>     "osd_pglog": {
>         "items": 0,
>         "bytes": 0
>     },
>     "osdmap": {
>         "items": 80,
>         "bytes": 3296
>     },
>     "osdmap_mapping": {
>         "items": 0,
>         "bytes": 0
>     },
>     "pgmap": {
>         "items": 0,
>         "bytes": 0
>     },
>     "mds_co": {
>         "items": 17604,
>         "bytes": 2330840
>     },
>     "unittest_1": {
>         "items": 0,
>         "bytes": 0
>     },
>     "unittest_2": {
>         "items": 0,
>         "bytes": 0
>     },
>     "total": {
>         "items": 114206,
>         "bytes": 18344542
>     }
> }
> ------------------------------------------------------------
> -------------------------------------------------------
>
>
> Sorry for my english!.
>
>
> Greetings!!
>
>
>
> El 23 jul. 2018 20:08, "Patrick Donnelly" <[email protected]> escribió:
>
> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco <[email protected]>
> wrote:
> > Hi, thanks for your response.
> >
> > Clients are about 6, and 4 of them are the most of time on standby. Only
> two
> > are active servers that are serving the webpage. Also we've a varnish on
> > front, so are not getting all the load (below 30% in PHP is not much).
> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
>
> What! Please post `ceph daemon mds.<name> config diff`,  `... perf
> dump`, and `... dump_mempools `  from the server the active MDS is on.
>
>
> > I've tested
> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up
> to
> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at
> least
> > the memory usage is stable on less than 6Gb (now is using about 1GB of
> RAM).
>
> We've seen reports of possible memory leaks before and the potential
> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
> Your MDS cache size should be configured to 1-8GB (depending on your
> preference) so it's disturbing to see you set it so low.
>
>
> --
> Patrick Donnelly
>
>
>


-- 
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to