On Aug 18, 2015, at 11:53 AM, Paul Marquess <paul.marqu...@owmobility.com> wrote: >> From: Jason Evans [mailto:jas...@canonware.com] > >> On Aug 18, 2015, at 8:49 AM, Paul Marquess <paul.marqu...@owmobility.com> >> wrote: >>>> From: Jason Evans [mailto:jas...@canonware.com] >>>> >>>> On Aug 18, 2015, at 5:14 AM, Paul Marquess <paul.marqu...@owmobility.com> >>>> wrote: >>>>> I see a reference to a fix for arena_tcache_fill_small and corruption in >>>>> the 4.0 ChangeLog. Any chance it could be the root cause for this issue? >>>> >>>> It's possible, but the failure mode for that bug depends on failing to map >>>> memory (i.e. extreme memory pressure). >>> >>> do you mean a failure in the call to mmap? Assume that isn't necessarily >>> catastrophic (otherwise I assume you would assert straight away). >> >> Yes, mmap() and sbrk() failure. It should simply result in malloc() >> returning NULL, but the arena_tcache_fill_small bug you mentioned caused >> corruption that would later cause crashes. > > Guess we need to wrap jemalloc's malloc and get it to assert when it gets a > null. Perhaps get a dump of jemallocs state -- would the stats interface in > jemalloc will still be operational if we are OOM? Alternative is to get the > stats from the core -- I see there are a couple of core file postmortem > scripts for jemalloc knocking about, but none seem to support 3.6.
You might be able to strace and audit the mmap() failures, but an easier solution would be to add an abort() in the known bad code path within arena_tcache_fill_small() so that you know if you've hit the failure mode. > Something else has occurred to me - we had a problem with THP and > uninterruptable sleep (~30 seconds) very recently that was fixed by tuning > the swappiness parameter. When researching that I spotted a number of threads > that suggested that the combination of THP and jemalloc can result in memory > growth. This thread is an example > https://www.digitalocean.com/company/blog/transparent-huge-pages-and-alternative-memory-allocators/ > . I know it's too much of a stretch to suggest that this is the root cause > of the OOM, but if it does cause memory growth it won't help. > > Do you have any feeling whether it is safe to have jemalloc and THP at the > same time? I've had pretty poor experience with the mixture even within the past month. The problem is that at some point (under a day of intermittent benchmarking in all the cases I observed) the kernel gets into a fragmented memory state that it cannot recover from without a reboot, and the only obvious indications are decreased performance and increased page faults. Jason _______________________________________________ jemalloc-discuss mailing list jemalloc-discuss@canonware.com http://www.canonware.com/mailman/listinfo/jemalloc-discuss