I agree completely. For me it's a no-brainer. I have missed countless night's sleep, Thanksgiving dinners, weekends, vacations because of buggy code. I don't complain - it comes with the territory. I've taken short term consulting jobs where i worked close to 24hrs a day helping resolve critical outages whilst my vacationing family were in a swimming pool and I sat inside with a laptop. So I appreciate stability. THP is a great idea with a broken implementation.
Life is too short to deploy known broken configurations. transparent_hugepage=never has worked well for me so far. On Wed, Aug 23, 2017 at 1:51 PM, Tom Lee <[email protected]> wrote: > Peter, just want to say I've also seen very similar behavior with JVM heap > sizes ~16GB. I feel like I've seen multiple "failure" modes with THP, but > most alarmingly we observed brief system-wide lockups in some cases, > similar to those described in: https://access.redhat.com/solutions/1560893. > (Don't quite recall if we saw that exact "soft lockup" message, but do > recall something similar -- and around the time we saw that message we also > observed gaps in the output of a separate shell script that was > periodically writing a message to a file every 5 seconds.) > > I'm probably just scarred from the experience, but to me the question of > whether to leave THP=always in such environments feels more like "do I want > to gamble on this pathological behavior occurring?" than some dial for fine > tuning performance. Maybe it's better in more recent RHEL kernels, but > never really had a reason to roll the dice on it. > > (This shouldn't scare folks off [non-transparent] hugepages entirely > though -- had much better results with those.) > > On Wed, Aug 23, 2017 at 3:52 AM, Peter Booth <[email protected]> wrote: > >> >> Some points: >> >> Those of us working in large corporate settings are likely to be running >> close to vanilla RHEL 7.3 or 6.9 with kernel versions 3.10.0-514 or >> 2.6.32-696 >> respectively. >> >> I have seen the THP issue first hand in a dramatic fashion. One Java >> trading application I supported ran with heaps that ranged from 32GB to >> 64GB, >> running on Azul Zing, with no appreciable GC pauses. It was migrated from >> Westmere hardware on RHEL 5.6 to (faster) Ivy Bridge hardware on RHEL 6.4. >> In non-production environments only, the application suddenly began >> showing occasional pauses of upto a few seconds. Occasional meaning only >> four or five out of 30 instances showed a pause, and they might only have >> one or two or three pauses in a day. These instances ran a workload that >> replicated a production workload. I noticed that the only difference >> between these hosts and the healthy production hosts was that, due to human >> error, >> THP was disabled on the production hosts but not the non-prod hosts. As >> soon as we disabled THP on the non-prod hosts the pauses disappeared. >> >> This was a reactive discovery - I haven't done any proactive >> investigation of the effects of THP. This was sufficient for me to rule it >> out for today. >> >> >> >> >> On Sunday, August 20, 2017 at 10:32:45 AM UTC-4, Alexandr Nikitin wrote: >>> >>> Thank you for the feedback! Appreciate it. Yes, you are right. The >>> intention was not to show that THP is an awesome feature but to share >>> techniques to measure and control risks. I made the changes >>> <https://github.com/alexandrnikitin/blog/compare/2139c405f0c50a3ab907fb2530421bf352caa412...3e58094386b14d19e06752d9faa0435be2cbe651>to >>> highlight the purpose and risks. >>> >>> The experiment is indeed interesting. I believe the "defer" option >>> should help in that environment. I'm really keen to try the latest kernel >>> (related not only to THP). >>> >>> *Frankly, I still don't have strong opinion about huge latency spikes in >>> allocation path in general. I'm not sure whether it's a THP issue or >>> application/environment itself. Likely it's high memory pressure in general >>> that causes spikes. Or the root of the issues is in something else, e.g. >>> the jemalloc case.* >>> >>> >>> On Friday, August 18, 2017 at 6:32:40 PM UTC+3, Gil Tene wrote: >>>> >>>> This is very well written and quite detailed. It has all the makings of >>>> a great post I'd point people to. However, as currently stated, I'd worry >>>> that it would (mis)lead readers into using THP with "always" >>>> /sys/kernel/mm/transparent_hugepage/defrag settings (instead of >>>> "defer"), and/or on older (pre-4.6) kernels with a false sense that the >>>> many-msec slow path allocation latency problems many people warn about >>>> don't actually exist. You do link to the discussions on the subject, but >>>> the measurements and summary conclusion of the posting alone would not end >>>> up warning people who don't actually follow those links. >>>> >>>> I assume your intention is not to have the reader conclude that "there >>>> is lots of advise out there telling you to turn off THP, and it is wrong. >>>> Turning it on is perfectly safe, and may significantly speed up your >>>> application", but are instead are aiming for something like "THP used to be >>>> problematic enough to cause wide ranging recommendations to simply turn it >>>> off, but this has changed with recent Linux kernels. It is now safe to use >>>> in widely applicable ways (will th the right settings) and can really help >>>> application performance without risking huge stalls". Unfortunately, I >>>> think that many readers would understand the current text as the former, >>>> not the latter. >>>> >>>> Here is what I'd change to improve on the current text: >>>> >>>> 1. Highlight the risk of high slow path allocation latencies with the >>>> "always" (and even "madvise") setting in /sys/kernel/mm/transparent_ >>>> hugepage/defrag, the fact that the "defer" option is intended to >>>> address those risks, and this defer option is available with Linux kernel >>>> versions 4.6 or later. >>>> >>>> 2. Create an environment that would actually demonstrate these very >>>> high (many msec or worse) latencies in the allocation slow path with defrag >>>> set to "always". This is the part that will probably take some extra work, >>>> but it will also be a very valuable contribution. The issues are so widely >>>> reported (into the 100s of msec or more, and with a wide verity of >>>> workloads as your links show) that intentional reproduction *should* be >>>> possible. And being able to demonstrate it actually happening will also >>>> allow you to demonstrate how newer kernels address it with the defer >>>> setting. >>>> >>>> 3. Show how changing the defrag setting to "defer" removes the high >>>> latencies seen by the allocation slow path under the same conditions. >>>> >>>> For (2) above, I'd look to induce a situation where the allocation slow >>>> path can't find a free 2MB page without having to defragment one directly. >>>> E.g. >>>> - I'd start by significantly slowing down the background >>>> defragmentation in khugepaged (e.g set /sys/kernel/mm/transparent >>>> _hugepage/khugepaged/scan_sleep_millisecs to 3600000). I'd avoid >>>> turning it off completely in order to make sure you are still measuring the >>>> system in a configuration that believes it does background defragmentation. >>>> - I'd add some static physical memory pressure (e.g. allocate and touch >>>> a bunch of anonymous memory in a process that would just sit on it) such >>>> that the system would only have 2-3GB free for buffers and your netty >>>> workload's heap. A sleeping jvm launched with an empirically sized and big >>>> enough -Xmx and -Xms and with AlwaysPretouch on is an easy way to do that. >>>> - I'd then create an intentional and spiky fragmentation load (e.g. >>>> perform spikes of a scanning through a 20GB file every minute or so). >>>> - with all that in place, I'd then repeatedly launch and run your Netty >>>> workload without the PreTouch flag, in order to try to induce situations >>>> where an on-demand allocated 2MB heap page hits the slow path, and the >>>> effect shows up in your netty latency measurements. >>>> >>>> All the above are obviously experimentation starting points, and may >>>> take some iteration to actually induce the demonstrated high latencies we >>>> are looking for. But once you are able to demonstrate the impact of >>>> on-demand allocation doing direct (synchronous) compaction both in your >>>> application latency measurement and in your kernel tracing data, you would >>>> then be able to try the same experiment with the defrag setting set to >>>> "defer" to show how newer kernels and this new setting now make it safe (or >>>> at least much more safe) to use THP. And with that actually demonstrated, >>>> everything about THP recommendations for freeze-averse applications can >>>> change, making for a really great posting. >>>> >>>> Sent from my iPad >>>> >>>> On Aug 18, 2017, at 3:00 AM, Alexandr Nikitin <[email protected]> >>>> wrote: >>>> >>>> I decided to write a post about measuring the performance impact >>>> (otherwise it stays in my messy notes forever) >>>> Any feedback is appreciated. >>>> https://alexandrnikitin.github.io/blog/transparent-hugepages >>>> -measuring-the-performance-impact/ >>>> >>>> On Saturday, August 12, 2017 at 1:01:31 PM UTC+3, Alexandr Nikitin >>>> wrote: >>>>> >>>>> I played with Transparent Hugepages some time ago and I want to share >>>>> some numbers based on real world high-load applications. >>>>> We have a JVM application: high-load tcp server based on netty. No >>>>> clear bottleneck, CPU, memory and network are equally highly loaded. The >>>>> amount of work depends on request content. >>>>> The following numbers are based on normal server load ~40% of maximum >>>>> number of requests one server can handle. >>>>> >>>>> *When THP is off:* >>>>> End-to-end application latency in microseconds: >>>>> "p50" : 718.891, >>>>> "p95" : 4110.26, >>>>> "p99" : 7503.938, >>>>> "p999" : 15564.827, >>>>> >>>>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000 >>>>> ... >>>>> ... 25,164,369 iTLB-load-misses >>>>> ... 81,154,170 dTLB-load-misses >>>>> ... >>>>> >>>>> *When THP is always on:* >>>>> End-to-end application latency in microseconds: >>>>> "p50" : 601.196, >>>>> "p95" : 3260.494, >>>>> "p99" : 7104.526, >>>>> "p999" : 11872.642, >>>>> >>>>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000 >>>>> ... >>>>> ... 21,400,513 dTLB-load-misses >>>>> ... 4,633,644 iTLB-load-misses >>>>> ... >>>>> >>>>> As you can see THP performance impact is measurable and too >>>>> significant to ignore. 4.1 ms vs 3.2 ms 99%% and 100M vs 25M TLB misses. >>>>> I also used SytemTap to measure few kernel functions like >>>>> collapse_huge_page, clear_huge_page, split_huge_page. There were no >>>>> significant spikes using THP. >>>>> AFAIR that was 3.10 kernel which is 4 years old now. I can repeat >>>>> experiments with the newer kernels if there's interest. (I don't know what >>>>> was changed there though) >>>>> >>>>> >>>>> On Monday, August 7, 2017 at 6:42:21 PM UTC+3, Peter Veentjer wrote: >>>>>> >>>>>> Hi Everyone, >>>>>> >>>>>> I'm failing to understand the problem with transparent huge pages. >>>>>> >>>>>> I 'understand' how normal pages work. A page is typically 4kb in a >>>>>> virtual address space; each process has its own. >>>>>> >>>>>> I understand how the TLB fits in; a cache providing a mapping of >>>>>> virtual to real addresses to speed up address conversion. >>>>>> >>>>>> I understand that using a large page e.g. 2mb instead of a 4kb page >>>>>> can reduce pressure on the TLB. >>>>>> >>>>>> So till so far it looks like huge large pages makes a lot of sense; >>>>>> of course at the expensive of wasting memory if only a small section of a >>>>>> page is being used. >>>>>> >>>>>> The first part I don't understand is: why is it called transparent >>>>>> huge pages? So what is transparent about it? >>>>>> >>>>>> The second part I'm failing to understand is: why can it cause >>>>>> problems? There are quite a few applications that recommend disabling THP >>>>>> and I recently helped a customer that was helped by disabling it. It >>>>>> seems >>>>>> there is more going on behind the scene's than having an increased page >>>>>> size. Is it caused due to fragmentation? So if a new page is needed and >>>>>> memory is fragmented (due to smaller pages); that small-pages need to be >>>>>> compacted before a new huge page can be allocated? But if this would be >>>>>> the >>>>>> only thing; this shouldn't be a problem once all pages for the >>>>>> application >>>>>> have been touched and all pages are retained. >>>>>> >>>>>> So I'm probably missing something simple. >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "mechanical-sympathy" group. >>>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>>> pic/mechanical-sympathy/sljzehnCNZU/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > *Tom Lee */ http://tomlee.co / @tglee <http://twitter.com/tglee> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "mechanical-sympathy" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/mechanical-sympathy/sljzehnCNZU/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
