My understanding is that basically the kernel is either unable or uninterested (maybe due to lack of memory pressure?) in reclaiming the memory .  It's possible you might have better behavior if you set /sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or maybe disable transparent huge pages entirely.

Some background:

https://github.com/gperftools/gperftools/issues/1073

https://blog.nelhage.com/post/transparent-hugepages/

https://www.kernel.org/doc/Documentation/vm/transhuge.txt


Mark


On 4/9/19 7:31 AM, Olivier Bonvalet wrote:
Well, Dan seems to be right :

_tune_cache_size
         target: 4294967296
           heap: 6514409472
       unmapped: 2267537408
         mapped: 4246872064
old cache_size: 2845396873
new cache size: 2845397085


So we have 6GB in heap, but "only" 4GB mapped.

But "ceph tell osd.* heap release" should had release that ?


Thanks,

Olivier


Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit :
One of the difficulties with the osd_memory_target work is that we
can't
tune based on the RSS memory usage of the process. Ultimately it's up
to
the kernel to decide to reclaim memory and especially with
transparent
huge pages it's tough to judge what the kernel is going to do even
if
memory has been unmapped by the process.  Instead the autotuner looks
at
how much memory has been mapped and tries to balance the caches based
on
that.


In addition to Dan's advice, you might also want to enable debug
bluestore at level 5 and look for lines containing "target:" and
"cache_size:".  These will tell you the current target, the mapped
memory, unmapped memory, heap size, previous aggregate cache size,
and
new aggregate cache size.  The other line will give you a break down
of
how much memory was assigned to each of the bluestore caches and how
much each case is using.  If there is a memory leak, the autotuner
can
only do so much.  At some point it will reduce the caches to fit
within
cache_min and leave it there.


Mark


On 4/8/19 5:18 AM, Dan van der Ster wrote:
Which OS are you using?
With CentOS we find that the heap is not always automatically
released. (You can check the heap freelist with `ceph tell osd.0
heap
stats`).
As a workaround we run this hourly:

ceph tell mon.* heap release
ceph tell osd.* heap release
ceph tell mds.* heap release

-- Dan

On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet <
ceph.l...@daevel.fr> wrote:
Hi,

on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the
osd_memory_target :

daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd
ceph        3646 17.1 12.0 6828916 5893136 ?     Ssl  mars29
1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser
ceph --setgroup ceph
ceph        3991 12.9 11.2 6342812 5485356 ?     Ssl  mars29
1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser
ceph --setgroup ceph
ceph        4361 16.9 11.8 6718432 5783584 ?     Ssl  mars29
1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser
ceph --setgroup ceph
ceph        4731 19.7 12.2 6949584 5982040 ?     Ssl  mars29
2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser
ceph --setgroup ceph
ceph        5073 16.7 11.6 6639568 5701368 ?     Ssl  mars29
1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser
ceph --setgroup ceph
ceph        5417 14.6 11.2 6386764 5519944 ?     Ssl  mars29
1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser
ceph --setgroup ceph
ceph        5760 16.9 12.0 6806448 5879624 ?     Ssl  mars29
1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser
ceph --setgroup ceph
ceph        6105 16.0 11.6 6576336 5694556 ?     Ssl  mars29
1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser
ceph --setgroup ceph

daevel-ob@ssdr712h:~$ free -m
                total        used        free      shared  buff/ca
che   available
Mem:          47771       45210        1643          17         9
17       43556
Swap:             0           0           0

# ceph daemon osd.147 config show | grep memory_target
      "osd_memory_target": "4294967296",


And there is no recovery / backfilling, the cluster is fine :

     $ ceph status
       cluster:
         id:     de035250-323d-4cf6-8c4b-cf0faf6296b1
         health: HEALTH_OK

       services:
         mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel
         mgr: tsyne(active), standbys: olkas, tolriq, lorunde,
amphel
         osd: 120 osds: 116 up, 116 in

       data:
         pools:   20 pools, 12736 pgs
         objects: 15.29M objects, 31.1TiB
         usage:   101TiB used, 75.3TiB / 177TiB avail
         pgs:     12732 active+clean
                  4     active+clean+scrubbing+deep

       io:
         client:   72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd,
1.29kop/s wr


     On an other host, in the same pool, I see also high memory
usage :

     daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd
     ceph        6287  6.6 10.6 6027388 5190032 ?     Ssl  mars21
1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser
ceph --setgroup ceph
     ceph        6759  7.3 11.2 6299140 5484412 ?     Ssl  mars21
1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser
ceph --setgroup ceph
     ceph        7114  7.0 11.7 6576168 5756236 ?     Ssl  mars21
1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser
ceph --setgroup ceph
     ceph        7467  7.4 11.1 6244668 5430512 ?     Ssl  mars21
1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser
ceph --setgroup ceph
     ceph        7821  7.7 11.1 6309456 5469376 ?     Ssl  mars21
1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser
ceph --setgroup ceph
     ceph        8174  6.9 11.6 6545224 5705412 ?     Ssl  mars21
1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser
ceph --setgroup ceph
     ceph        8746  6.6 11.1 6290004 5477204 ?     Ssl  mars21
1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser
ceph --setgroup ceph
     ceph        9100  7.7 11.6 6552080 5713560 ?     Ssl  mars21
1757:22 /usr/bin/ceph-osd -f --cluster ceph --id 138 --setuser
ceph --setgroup ceph

     But ! On a similar host, in a different pool, the problem is
less visible :

     daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd
     ceph        3617  2.8  9.9 5660308 4847444 ?     Ssl  mars29
313:05 /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser
ceph --setgroup ceph
     ceph        3958  2.3  9.8 5661936 4834320 ?     Ssl  mars29
256:55 /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser
ceph --setgroup ceph
     ceph        4299  2.3  9.8 5620616 4807248 ?     Ssl  mars29
266:26 /usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser
ceph --setgroup ceph
     ceph        4643  2.3  9.6 5527724 4713572 ?     Ssl  mars29
262:50 /usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser
ceph --setgroup ceph
     ceph        5016  2.2  9.7 5597504 4783412 ?     Ssl  mars29
248:37 /usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser
ceph --setgroup ceph
     ceph        5380  2.8  9.9 5700204 4886432 ?     Ssl  mars29
321:05 /usr/bin/ceph-osd -f --cluster ceph --id 156 --setuser
ceph --setgroup ceph
     ceph        5724  3.1 10.1 5767456 4953484 ?     Ssl  mars29
352:55 /usr/bin/ceph-osd -f --cluster ceph --id 157 --setuser
ceph --setgroup ceph
     ceph        6070  2.7  9.9 5683092 4868632 ?     Ssl  mars29
309:10 /usr/bin/ceph-osd -f --cluster ceph --id 158 --setuser
ceph --setgroup ceph


     Is there some memory leak ? Or should I expect that
osd_memory_target
     (the default 4GB here) is not really followed, and so reduce
it ?

     Thanks,


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to