[
https://issues.apache.org/jira/browse/KUDU-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280420#comment-15280420
]
Todd Lipcon commented on KUDU-1447:
-----------------------------------
Best I can tell, yea, all el6 systems are affected, and no el7 systems, but I
haven't tried to repro on el7. I think https://lwn.net/Articles/472097/ patch
series was committed for kernel 3.2, and el7 is based on 3.2 iirc.
> Document recommendation to disable THP
> --------------------------------------
>
> Key: KUDU-1447
> URL: https://issues.apache.org/jira/browse/KUDU-1447
> Project: Kudu
> Issue Type: Improvement
> Components: documentation
> Affects Versions: 0.8.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> Doing a bunch of cluster testing, I finally got to the root of why sometimes
> threads take several seconds to start up, causing various timeout issues,
> false elections, etc. It turns out that khugepaged does synchronous page
> compaction while holding a process's mmap semaphore, and when that's
> concurrent with lots of IO, can block for several seconds.
> https://lkml.org/lkml/2011/7/26/103
> To avoid this, we should tell users to set hugepages to "madvise" or "never"
> -- it's not sufficient to just disable defrag, because khugepaged still runs
> in the background in that case and causes this sporadic issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)