Awesome!!! With my job title most folks think I'm essentially technically neutered these days.
Good to see there is still some life in this old dog :-) Best, J. On Thursday, July 9, 2015, mathog <[email protected]> wrote: > On 09-Jul-2015 11:54, James Cuff wrote: > >> http://blog.jcuff.net/2015/04/of-huge-pages-and-huge-performance-hits.html >> > > Well, that seems to be it, but not quite with the same symptoms you > observed. khugepaged never showed up, and "perf top" never revealed > _spin_lock_irqsave. Instead this is what "perf top" shows in my tests: > > (hugepage=always, when migration/# process observed) > 89.97% [kernel] [k] compaction_alloc > 1.21% [kernel] [k] compact_zone > 1.18% [kernel] [k] get_pageblock_flags_group > 0.75% [kernel] [k] __reset_isolation_suitable > 0.57% [kernel] [k] clear_page_c_e > > (hugepage=always, when events/# process observed) > 85.97% [kernel] [k] compaction_alloc > 0.84% [kernel] [k] compact_zone > 0.65% [kernel] [k] get_pageblock_flags_group > 0.64% perf [.] 0x000000000005cff7 > > (hugepage=never) > 29.86% [kernel] [k] clear_page_c_e > 21.88% [kernel] [k] copy_user_generic_string > 12.46% [kernel] [k] __alloc_pages_nodemask > 5.70% [kernel] [k] page_fault > > This is good, because "perf top" shows that the underlying issue > is compaction_alloc and compact_zone even though what top shows > is in one case migration/# and when locked to a cpu, events/#. > > Switching hugepage always->never seems to make things work right away. > Switching hugepage never->always seems to take a while to break. In order > to get it to start failing many of the big files involved must be copied to > /dev/null again, even though they were presumably already in file cache. > > Searched for "compaction_alloc" and "compact_zone" and found a suggestion > here > > > https://structureddata.github.io/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/ > > to do: > > echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag > > (transparent_hugepage is a link to redhat_transparent_hugepage). > Reenabled hugepage and reproduced the painfully slow IO, set defrag to > "never" and the IO was fast again, even though hugepage was still enabled. > > So on my machine the problem seems to be with hugepage defrag > specifically. Disabling just that is sufficient to resolve the issue, it > isn't necessary to take out all of hugepage. Will let > it run that way for a while and see if anything else shows up. > > For future reference: > > CentOS release 6.6 (Final) > kernel 2.6.32-504.23.4.el6.x86_64 > Dell Inc. PowerEdge T620/03GCPM, BIOS 2.2.2 01/16/2014 > 48 Intel Xeon CPU E5-2695 v2 @ 2.40GHz (in /proc/cpuinfo) > RAM 529231456 kB (in /proc/meminfo) > > Thanks all! > > David Mathog > [email protected] > Manager, Sequence Analysis Facility, Biology Division, Caltech > -- (Via iPhone)
_______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
