Regarding measurements: I understand that it's hard. In my case, the 
measurements were done on production servers and production load. Servers were 
not overloaded, they got ~40% of their capacity. Latency were gathered for a 
few dozen minutes. Kernel (khugepaged) functions probing was done for a few 
hours (I think).
What I didn't measure is the maximum throughput, slow allocation and compaction 
path mentioned by Gil, page table size and page walking time. If anyone knows 
how to probe the kernel page walking time, then it would be interesting to 
compare whether the page and table sizes affect it or not.
It could be a good time to repeat the experiments. Please advice what and how 
to measure.

>
> The stack trace example I posted earlier represents the path that will be 
> taken if an on-demand allocation page fault on a THP-allocated region happens 
> when no free 2MB page is available in the system.


To be honest I though that if THP fails to allocate a hugepage then it falls 
back to regular pages. I thought that khugepaged does the compaction logic (if 
the setting is not always turns out). I see it in docs 
https://www.kernel.org/doc/Documentation/vm/transhuge.txt

"- if a hugepage allocation fails because of memory fragmentation,
  regular pages should be gracefully allocated instead and mixed in
  the same vma without any failure or significant delay and without
  userland noticing
"

The compaction/ defrag phase can be addressed with its own flags:

/sys/kernel/mm/transparent_hugepage/defrag
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs

I'm not a kernel expert though and I may be wrong. I'm really interested if 
those flags could solve or mitigate the freezes people mentioned here.

>
> If that occasional outlier is something you are fine with, then turning THP 
> on for the speed benefits you may be seeing makes sense. But if you can't 
> accept the occasional ~0.5+ sec freezes, turn it off. 


I just wanted to show for people who blindly follow advice on the Internet (and 
there are many such suggestions) that there's an impact. It can be noticeable 
and depends on setup and load.



On Sunday, August 13, 2017 at 10:10:01 AM UTC+3, Gil Tene wrote:
>
>
>
> On Saturday, August 12, 2017 at 3:01:31 AM UTC-7, Alexandr Nikitin wrote:
>>
>> I played with Transparent Hugepages some time ago and I want to share 
>> some numbers based on real world high-load applications.
>> We have a JVM application: high-load tcp server based on netty. No clear 
>> bottleneck, CPU, memory and network are equally highly loaded. The amount 
>> of work depends on request content.
>> The following numbers are based on normal server load ~40% of maximum 
>> number of requests one server can handle.
>>
>> *When THP is off:*
>> End-to-end application latency in microseconds:
>> "p50" : 718.891,
>> "p95" : 4110.26,
>> "p99" : 7503.938,
>> "p999" : 15564.827,
>>
>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000
>> ...
>> ...         25,164,369      iTLB-load-misses
>> ...         81,154,170      dTLB-load-misses
>> ...
>>
>> *When THP is always on:*
>> End-to-end application latency in microseconds:
>> "p50" : 601.196,
>> "p95" : 3260.494,
>> "p99" : 7104.526,
>> "p999" : 11872.642,
>>
>> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000
>> ...
>> ...    21,400,513      dTLB-load-misses
>> ...      4,633,644      iTLB-load-misses
>> ...
>>
>> As you can see THP performance impact is measurable and too significant 
>> to ignore. 4.1 ms vs 3.2 ms 99%% and 100M vs 25M TLB misses.
>> I also used SytemTap to measure few kernel functions like 
>> collapse_huge_page, clear_huge_page, split_huge_page. There were no 
>> significant spikes using THP.
>> AFAIR that was 3.10 kernel which is 4 years old now. I can repeat 
>> experiments with the newer kernels if there's interest. (I don't know what 
>> was changed there though)
>>
>
> Unfortunately, just because you didn't run into a huge spike during your 
> test doesn't mean it won't hit you in the future... The stack trace example 
> I posted earlier represents the path that will be taken if an on-demand 
> allocation page fault on a THP-allocated region happens when no free 2MB 
> page is available in the system. Inducing that behavior is not that hard, 
> e.g. just do a bunch of high volume journaling or logging, and you'll 
> probably trigger it eventually. And when it does take that path, that will 
> be your thread de-fragging the entire system's physical memory, one 2MB 
> page at a time.
>
> And when that happens, you're probably not talking 10-20msec. More like 
> several hundreds of msec (growing with the system physical memory size, the 
> specific stack trace is taken from a RHEL issue that reported >22 seconds). 
> If that occasional outlier is something you are fine with, then turning THP 
> on for the speed benefits you may be seeing makes sense. But if you can't 
> accept the occasional ~0.5+ sec freezes, turn it off. 
>
>
>>
>> On Monday, August 7, 2017 at 6:42:21 PM UTC+3, Peter Veentjer wrote:
>>>
>>> Hi Everyone,
>>>
>>> I'm failing to understand the problem with transparent huge pages.
>>>
>>> I 'understand' how normal pages work. A page is typically 4kb in a 
>>> virtual address space; each process has its own. 
>>>
>>> I understand how the TLB fits in; a cache providing a mapping of virtual 
>>> to real addresses to speed up address conversion.
>>>
>>> I understand that using a large page e.g. 2mb instead of a 4kb page can 
>>> reduce pressure on the TLB.
>>>
>>> So till so far it looks like huge large pages makes a lot of sense; of 
>>> course at the expensive of wasting memory if only a small section of a page 
>>> is being used. 
>>>
>>> The first part I don't understand is: why is it called transparent huge 
>>> pages? So what is transparent about it? 
>>>
>>> The second part I'm failing to understand is: why can it cause problems? 
>>> There are quite a few applications that recommend disabling THP and I 
>>> recently helped a customer that was helped by disabling it. It seems there 
>>> is more going on behind the scene's than having an increased page size. Is 
>>> it caused due to fragmentation? So if a new page is needed and memory is 
>>> fragmented (due to smaller pages); that small-pages need to be compacted 
>>> before a new huge page can be allocated? But if this would be the only 
>>> thing; this shouldn't be a problem once all pages for the application have 
>>> been touched and all pages are retained.
>>>
>>> So I'm probably missing something simple.
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to