[
https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004258#comment-13004258
]
Greg Roelofs commented on HADOOP-7156:
--------------------------------------
Doh! FF crashed while I was replying, sigh. Switching to e-mail:
bq. In my experience, we do a really bad job of keeping the wiki up to date.
Greg, what do you think?
I agree--we're much better at keeping the code up to date (frequently in
parallel across multiple branches ;-) ) than at keeping the wiki current.
I think the XML config text is fine; you could optionally prefix it with
"As of March 2011, systems known to ..." as a hint to users or future versions
of us to recheck it if significant time has passed. The comment in NativeIO.c
probably should be modified; perhaps "monitor used for working around a bug
in the sssd security daemon, which was observed in getpwuid_r() on RHEL 6.0,"
or words to that effect. (Need not be that verbose, of course.)
I also agree with Eli that we can leave the workaround disabled for tests.
It might be worthwhile to add a log message at the start that "this test
may fail (crash) with an invalid free() on some systems; see HADOOP-7156
for details." Again, feel free to word it however you wish.
Trivial grammo: "workaround" is a noun; the verb form is "work around"
(similar to layout, backup, setup, cleanup, checkin, cutoff, etc.). The
various variable names would be more proper if they reflected this (e.g.,
WORK_AROUND_NON_THREADSAFE_CALLS_KEY, workAroundNonThreadSafePasswdCalls
[or workAroundNonThreadsafePasswdCalls, since you're using "threadsafe"
as a single word elsewhere]), but I won't fuss if you leave them as is.
> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
> Key: HADOOP-7156
> URL: https://issues.apache.org/jira/browse/HADOOP-7156
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 0.22.0
> Environment: RHEL 6.0 "Santiago"
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hadoop-7156.txt, hadoop-7156.txt, hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not
> thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is
> by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are
> available, since the SecureIO functions call getpwuid_r as part of fstat. By
> enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid
> pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira