I agree in principle. Keeping the system-wide THP config setting to
"madvise" (as opposed to "always" or "never") should allow
careful-explicitly-thp-designated-via-madvise regions to benefit from THP
without carrying the blame for widespread latency spikes. It would
certainly be less exposed than applying THP by default to all anonymous
memory. But it is unfortunately still very widely exposed to latency
spikes. The reason for this wide exposure is that not everyone is as
careful and smart about madvise use of THP and it's latency implications,
and unfortunately the barn door settings available are either "open for
everyone" or "closed to everyone". there is no "open just for me" setting.
The weakness with the madvise mode is that it is enough for one piece of
code that your application ends up using to be less-than-perfect with how
it uses of the thp madvise for the entire application to experience huge
latencies. You can only turn the "madvise" behavior itself on/off globally,
and can't just use it for the part that you know and understand well.
Basically, choosing to let your well understood, responsible code "run with
[huge latency] scissors" means that all other code that may have used this
madvise as an "optimization" is also running with the same scissors.
Unfortunately, lots code, including common libraries you may not be
thinking of, may have chosen to use a thp madvise purely for its throughput
benefit without considering the latency implications and protecting against
them (by e.g. retouching or otherwise pre-populating the physical memory
allocation), and as a result may have used it on regions for which physical
memory might end up being on-demand allocated.
And before you (the reader) starts down the obvious "who would do that?"
and "what are the chances..." thought chains, take a look at the following
y natural verdiscussion about potentially changing all of malloc in Go to
do exactly that: https://github.com/golang/go/issues/14264 titled "runtime:
consider backing malloc structures with large pages". The issue discusses
the possibility of using madvise(...,..., MADV_HUGEPAGE) (doesn't mean
that's what they'll end up doing, but it's a great example of how that
might end up happening). It includes some measurements that show throughput
benefits when huge pages are used, but contains no discussion of the
potential latency outlier downsides. A similar discussion can easily end up
with all malloc'ed objects being on-demand-page-fault-allocated in THP
advised pages in some key allocator in some system.
On the other end of the spectrum is some cool jemalloc discussion that ends
up with the opposite:
describes how nuodb internally patched their jemalloc to madvise(...,...,
MADV_NOHUGEPAGE) on all jemalloc pages to get around surprising interplay
issues between THP and madvise(...,..., MADV_DONTNEED).
So madvise may come with some sharp edges. And if those edges don't hurt
you right now, they might change to hurt you soon (e.g. in an upcoming next
version of go or of yourFavoriteCoolMalloc). It's why my knee-jerk
recommendation is to turn THP completely off as a first step whenever
someone asks me about unexplained latency glitches.
For reference, in Zing we separately guarantee locked-down 2MB mappings for
all pages for the Java heap, code cache, permgen, and various GC support
structures regardless of THP or hugetlb settings (the equivalent of
hugetlbfs, but without those settings). And we probably wouldn't use a thp
madvise on any of the other memory regions. So for us either "madvise" or
"none" would work just as well.
On Wednesday, August 9, 2017 at 6:50:38 AM UTC-7, Aleksey Shipilev wrote:
> On 08/08/2017 06:44 PM, Gil Tene wrote:
> > On Monday, August 7, 2017 at 11:50:27 AM UTC-7, Alen Vrečko wrote:
> > Saw this a while back.
> > https://shipilev.net/jvm-anatomy-park/2-transparent-huge-pages/
> > <https://shipilev.net/jvm-anatomy-park/2-transparent-huge-pages/>
> > Basically using THP/defrag with madvise and using
> > -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch JVM opts.
> > Looks like the defrag cost should be paid in full at startup due to
> > AlwaysPreTouch. Never got around to test this in production. Just
> > THP disabled. Thoughts?
> > The above flags would only cover the Java heap. In a Java application.
> So obviously THP for non-Java
> > things doesn't get helped by that.
> So, this is the reason why to use THP in "madvise" mode? Then JVM
> madvise-s Java heap to THP,
> upfront defrags it with AlwaysPreTouch, but native allocations stay
> outside of THP path, and thus do
> not incur defrag latencies. If there is a native structure that allocates
> with madvise hint and does
> heavy churn causing defrag, I'd say it should have not madvise in the
> first place.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.