[ 
https://issues.apache.org/jira/browse/KUDU-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866609#comment-15866609
 ] 

Todd Lipcon commented on KUDU-1447:
-----------------------------------

This came up on a test cluster again today with THP enabled. Looking into it a 
bit, I think if we patched tcmalloc to use madvise(MADV_NOHUGEPAGE) just after 
mmap/sbrk, then our VMAs would avoid getting registered in khugepaged and 
wouldn't be subject to the scanning/compaction even if THP is otherwise enabled 
on the system. Might be easier than trying to get users to change their config.

> Document recommendation to disable THP
> --------------------------------------
>
>                 Key: KUDU-1447
>                 URL: https://issues.apache.org/jira/browse/KUDU-1447
>             Project: Kudu
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> Doing a bunch of cluster testing, I finally got to the root of why sometimes 
> threads take several seconds to start up, causing various timeout issues, 
> false elections, etc. It turns out that khugepaged does synchronous page 
> compaction while holding a process's mmap semaphore, and when that's 
> concurrent with lots of IO, can block for several seconds.
> https://lkml.org/lkml/2011/7/26/103
> To avoid this, we should tell users to set hugepages to "madvise" or "never" 
> -- it's not sufficient to just disable defrag, because khugepaged still runs 
> in the background in that case and causes this sporadic issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to