[
https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HADOOP-7156:
--------------------------------
Attachment: hadoop-7156.txt
Here's a heinous workaround for this issue:
When initializing the native library, we check for the problematic nss module -
if it's on the system, we're likely to run into this issue, so it adds a lock
around the getpwuid_r calls.
Do people think this hack is the right approach? My other thought was to add a
new boolean config flag like hadoop.nativeio.workaround.hadoop7156 or some
nonsense like that, and if that flag is set, add the monitor lock.
> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>
> Key: HADOOP-7156
> URL: https://issues.apache.org/jira/browse/HADOOP-7156
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 0.22.0
> Environment: RHEL 6.0 "Santiago"
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hadoop-7156.txt
>
>
> Due to the following bug in SSSD, functions like getpwuid_r are not
> thread-safe in RHEL 6.0 if sssd is specified in /etc/nsswitch.conf (as it is
> by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are
> available, since the SecureIO functions call getpwuid_r as part of fstat. By
> enabling -Xcheck:jni I get the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid
> pointer: 0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira