While running one of our internal benchmarks which uses hugepages, we observe
the following hang while running out of hugepages and the tasks are hung
with:

#0  0x0000040003f5c478 in __lll_lock_wait_private (futex=0x40004000870) at 
../nptl/sysdeps/unix/sysv/linux/lowlevellock.c:34
#1  0x0000040003ed3e0c in __libc_malloc (bytes=44) at malloc.c:3657
#2  0x0000040003e80bdc in _nl_make_l10nflist (l10nfile_list=0x40003fffca8, 
dirlist=0x40003faf040 "/usr/share/locale",
    dirlist_len=18, mask=<value optimized out>, language=0xffff067ae10 "en_US", 
territory=0x0, codeset=0x0,
    normalized_codeset=0x0, modifier=0x0, filename=0xffff067ae30 
"LC_MESSAGES/libc.mo", do_allocate=0) at l10nflist.c:193
#3  0x0000040003e7e8c8 in _nl_find_domain (dirname=0x40003faf040 
"/usr/share/locale", locale=0xffff067ae10 "en_US",
    domainname=0xffff067ae30 "LC_MESSAGES/libc.mo", domainbinding=0x0) at 
finddomain.c:88
#4  0x0000040003e7e104 in __dcigettext (domainname=0x40003faef58 "libc", 
msgid1=0x40003faf6a8 "Cannot allocate memory",
    msgid2=0x0, plural=<value optimized out>, n=0, category=<value optimized 
out>) at dcigettext.c:628
#5  0x0000040003e7ca10 in __dcgettext (domainname=<value optimized out>, 
msgid=<value optimized out>,
    category=<value optimized out>) at dcgettext.c:53
#6  0x0000040003ed96d4 in __strerror_r (errnum=<value optimized out>, buf=0x0, 
buflen=0) at _strerror.c:65
#7  0x0000040003ed95cc in strerror (errnum=<value optimized out>) at 
strerror.c:33
#8  0x000004000008d60c in ?? () from /usr/lib64/libhugetlbfs.so
#9  0x0000040003ed3338 in sYSMALLOc (av=0x40004000870, bytes=237753176) at 
malloc.c:3197
#10 _int_malloc (av=0x40004000870, bytes=237753176) at malloc.c:4747
#11 0x0000040003ed3c3c in __libc_malloc (bytes=237753176) at malloc.c:3660
#12 0x000004000055a9ac in .sftcr3d () from /usr/lib64/libpesslsmp.so.1
#13 0x0000040000457b9c in .pscrft3 () from /usr/lib64/libpesslsmp.so.1
#14 0x0000000010003a18 in .run_parallel ()
#15 0x0000000010002990 in .main ()

libhugeltbfs used is 2.12-2.el6/gcc-4.4.6-3.el6. It looks like a deadlock
with malloc. When glibc calls libhugetlbfs for more hugepage allocation
and when it fails to allocate more hugepages strerror is called, which inturns
tries to allocate memory. Eventually deadlocks which trying to acquire
the lock.

After some googling, found similar bug been reported against glibc at
http://sourceware.org/bugzilla/show_bug.cgi?id=13699. Any thoughts on
what might be causing the task hung ?

Thanks,
Kamalesh.


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

Reply via email to