I decided to write a post about measuring the performance impact (otherwise 
it stays in my messy notes forever) 
Any feedback is appreciated.
https://alexandrnikitin.github.io/blog/transparent-hugepages-measuring-the-performance-impact/

On Saturday, August 12, 2017 at 1:01:31 PM UTC+3, Alexandr Nikitin wrote:
>
> I played with Transparent Hugepages some time ago and I want to share some 
> numbers based on real world high-load applications.
> We have a JVM application: high-load tcp server based on netty. No clear 
> bottleneck, CPU, memory and network are equally highly loaded. The amount 
> of work depends on request content.
> The following numbers are based on normal server load ~40% of maximum 
> number of requests one server can handle.
>
> *When THP is off:*
> End-to-end application latency in microseconds:
> "p50" : 718.891,
> "p95" : 4110.26,
> "p99" : 7503.938,
> "p999" : 15564.827,
>
> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000
> ...
> ...         25,164,369      iTLB-load-misses
> ...         81,154,170      dTLB-load-misses
> ...
>
> *When THP is always on:*
> End-to-end application latency in microseconds:
> "p50" : 601.196,
> "p95" : 3260.494,
> "p99" : 7104.526,
> "p999" : 11872.642,
>
> perf stat -e dTLB-load-misses,iTLB-load-misses -p PID -I 1000
> ...
> ...    21,400,513      dTLB-load-misses
> ...      4,633,644      iTLB-load-misses
> ...
>
> As you can see THP performance impact is measurable and too significant to 
> ignore. 4.1 ms vs 3.2 ms 99%% and 100M vs 25M TLB misses.
> I also used SytemTap to measure few kernel functions like 
> collapse_huge_page, clear_huge_page, split_huge_page. There were no 
> significant spikes using THP.
> AFAIR that was 3.10 kernel which is 4 years old now. I can repeat 
> experiments with the newer kernels if there's interest. (I don't know what 
> was changed there though)
>
>
> On Monday, August 7, 2017 at 6:42:21 PM UTC+3, Peter Veentjer wrote:
>>
>> Hi Everyone,
>>
>> I'm failing to understand the problem with transparent huge pages.
>>
>> I 'understand' how normal pages work. A page is typically 4kb in a 
>> virtual address space; each process has its own. 
>>
>> I understand how the TLB fits in; a cache providing a mapping of virtual 
>> to real addresses to speed up address conversion.
>>
>> I understand that using a large page e.g. 2mb instead of a 4kb page can 
>> reduce pressure on the TLB.
>>
>> So till so far it looks like huge large pages makes a lot of sense; of 
>> course at the expensive of wasting memory if only a small section of a page 
>> is being used. 
>>
>> The first part I don't understand is: why is it called transparent huge 
>> pages? So what is transparent about it? 
>>
>> The second part I'm failing to understand is: why can it cause problems? 
>> There are quite a few applications that recommend disabling THP and I 
>> recently helped a customer that was helped by disabling it. It seems there 
>> is more going on behind the scene's than having an increased page size. Is 
>> it caused due to fragmentation? So if a new page is needed and memory is 
>> fragmented (due to smaller pages); that small-pages need to be compacted 
>> before a new huge page can be allocated? But if this would be the only 
>> thing; this shouldn't be a problem once all pages for the application have 
>> been touched and all pages are retained.
>>
>> So I'm probably missing something simple.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to