I agree in principle. Keeping the system-wide THP config setting to 
"madvise" (as opposed to "always" or "never") should allow 
careful-explicitly-thp-designated-via-madvise regions to benefit from THP 
without carrying the blame for widespread latency spikes. It would 
certainly be less exposed than applying THP by default to all anonymous 
memory. But it is unfortunately still very widely exposed to latency 
spikes. The reason for this wide exposure is that not everyone is as 
careful and smart about madvise use of THP and it's latency implications, 
and unfortunately the barn door settings available are either "open for 
everyone" or "closed to everyone". there is no "open just for me" setting.

The weakness with the madvise mode is that it is enough for one piece of 
code that your application ends up using to be less-than-perfect with how 
it uses of the thp madvise for the entire application to experience huge 
latencies. You can only turn the "madvise" behavior itself on/off globally, 
and can't just use it for the part that you know and understand well. 
Basically, choosing to let your well understood, responsible code "run with 
[huge latency] scissors" means that all other code that may have used this 
madvise as an "optimization" is also running with the same scissors. 
Unfortunately, lots code, including common libraries you may not be 
thinking of, may have chosen to use a thp madvise purely for its throughput 
benefit without considering the latency implications and protecting against 
them (by e.g. retouching or otherwise pre-populating the physical memory 
allocation), and as a result may have used it on regions for which physical 
memory might end up being on-demand allocated.

And before you (the reader) starts down the obvious "who would do that?" 
and "what are the chances..."  thought chains, take a look at the following 
y natural verdiscussion about potentially changing all of malloc in Go to 
do exactly that: https://github.com/golang/go/issues/14264 titled "runtime: 
consider backing malloc structures with large pages". The issue discusses 
the possibility of using madvise(...,..., MADV_HUGEPAGE)  (doesn't mean 
that's what they'll end up doing, but it's a great example of how that 
might end up happening). It includes some measurements that show throughput 
benefits when huge pages are used, but contains no discussion of the 
potential latency outlier downsides. A similar discussion can easily end up 
with all malloc'ed objects being on-demand-page-fault-allocated in THP 
advised pages in some key allocator in some system.

On the other end of the spectrum is some cool jemalloc discussion that ends 
up with the opposite: 
https://www.nuodb.com/techblog/linux-transparent-huge-pages-jemalloc-and-nuodb 
describes how nuodb internally patched their jemalloc to madvise(...,..., 
MADV_NOHUGEPAGE) on all jemalloc pages to get around surprising interplay 
issues between THP and madvise(...,..., MADV_DONTNEED).

So madvise may come with some sharp edges. And if those edges don't hurt 
you right now, they might change to hurt you soon (e.g. in an upcoming next 
version of go or of yourFavoriteCoolMalloc). It's why my knee-jerk 
recommendation is to turn THP completely off as a first step whenever 
someone asks me about unexplained latency glitches. 

For reference, in Zing we separately guarantee locked-down 2MB mappings for 
all pages for the Java heap, code cache, permgen, and various GC support 
structures regardless of THP or hugetlb settings (the equivalent of 
hugetlbfs, but without those settings). And we probably wouldn't use a thp 
madvise on any of the other memory regions. So for us either "madvise" or 
"none" would work just as well.

On Wednesday, August 9, 2017 at 6:50:38 AM UTC-7, Aleksey Shipilev wrote:
>
> On 08/08/2017 06:44 PM, Gil Tene wrote: 
> > On Monday, August 7, 2017 at 11:50:27 AM UTC-7, Alen Vrečko wrote: 
> >     Saw this a while back. 
> > 
> >     https://shipilev.net/jvm-anatomy-park/2-transparent-huge-pages/ 
> >     <https://shipilev.net/jvm-anatomy-park/2-transparent-huge-pages/> 
> > 
> >     Basically using THP/defrag with madvise and using 
> >     -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch JVM opts. 
> > 
> >     Looks like the defrag cost should be paid in full at startup due to 
> >     AlwaysPreTouch. Never got around to test this in production. Just 
> have 
> >     THP disabled. Thoughts? 
> > 
> > The above flags would only cover the Java heap. In a Java application. 
> So obviously THP for non-Java 
> > things doesn't get helped by that. 
>
> So, this is the reason why to use THP in "madvise" mode? Then JVM 
> madvise-s Java heap to THP, 
> upfront defrags it with AlwaysPreTouch, but native allocations stay 
> outside of THP path, and thus do 
> not incur defrag latencies. If there is a native structure that allocates 
> with madvise hint and does 
> heavy churn causing defrag, I'd say it should have not madvise in the 
> first place. 
>
> -Aleksey 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to