On 5/2/19 1:51 PM, Igor Podlesny wrote:
On Fri, 3 May 2019 at 01:29, Mark Nelson <mnel...@redhat.com> wrote:
On 5/2/19 11:46 AM, Igor Podlesny wrote:
On Thu, 2 May 2019 at 05:02, Mark Nelson <mnel...@redhat.com> wrote:
[...]
FWIW, if you still have an OSD up with tcmalloc, it's probably worth
looking at the heap stats to see how much memory tcmalloc thinks it's
allocated vs how much RSS memory is being used by the process.  It's
quite possible that there is memory that has been unmapped but that the
kernel can't (or has decided not yet to) reclaim.
Transparent huge pages can potentially have an effect here both with tcmalloc 
and with
jemalloc so it's not certain that switching the allocator will fix it entirely.
Most likely wrong. -- Default kernel's settings in regards of THP are "madvise".
None of tcmalloc or jemalloc would madvise() to make it happen.
With fresh enough jemalloc you could have it, but it needs special
malloc.conf'ing.

  From one of our centos nodes with no special actions taken to change
THP settings (though it's possible it was inherited from something else):


$ cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
"madvise" will enter direct reclaim like "always" but only for regions
that are have used madvise(MADV_HUGEPAGE). This is the default behaviour.

-- https://www.kernel.org/doc/Documentation/vm/transhuge.txt


Why are you quoting the description for the madvise setting when that's clearly not what was set in the case I just showed you?



And regarding madvise and alternate memory allocators:
https:
[...]

did you ever read any of it?

one link's info:

"By default jemalloc does not use huge pages for heap memory (there is
opt.metadata_thp which uses THP for internal metadata though)"


"It turns out that|jemalloc(3)|uses|madvise(2)|extensively to notify the operating system that it's done with a range of memory which it had previously|malloc|'ed. Because the machine used transparent huge pages, the page size was 2MB. As such, a lot of the memory which was being marked with|madvise(..., MADV_DONTNEED)|was within ranges substantially smaller than 2MB. This meant that the operating system never was able to evict pages which had ranges marked as|MADV_DONTNEED|because the entire page would have to be unneeded to allow it to be reused.

So despite initially looking like a leak, the operating system itself was unable to free memory because of|madvise(2)|and transparent huge pages.^4 <https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/#fn:4> This led to sustained memory pressure on the machine and|redis-server|eventually getting OOM killed."


I'm not going to argue with you about this.  Test it if you want or don't.


Mark



(and I've said
None of tcmalloc or jemalloc would madvise() to make it happen.
With fresh enough jemalloc you could have it, but it needs special
malloc.conf'ing.
before)

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to