[
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166959#comment-16166959
]
Ruslan Dautkhanov commented on HDFS-11851:
------------------------------------------
After applying this patch program started core dumping - here's gdb back trace
{code}
(gdb) bt
#0 0x00007fe78a34b1d7 in raise () from /lib64/libc.so.6
#1 0x00007fe78a34c8c8 in abort () from /lib64/libc.so.6
#2 0x00007fe78b212185 in os::abort(bool) () from
/usr/java/default/jre/lib/amd64/server/libjvm.so
#3 0x00007fe78b3b4593 in VMError::report_and_die() () from
/usr/java/default/jre/lib/amd64/server/libjvm.so
#4 0x00007fe78b21768f in JVM_handle_linux_signal () from
/usr/java/default/jre/lib/amd64/server/libjvm.so
#5 0x00007fe78b20dbe3 in signalHandler(int, siginfo*, void*) () from
/usr/java/default/jre/lib/amd64/server/libjvm.so
#6 <signal handler called>
#7 0x00007fe78a6db8b0 in setTLSExceptionStrings () from
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#8 0x00007fe78a6da52c in printExceptionAndFreeV () from
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#9 0x00007fe78a6da6cd in printExceptionAndFree () from
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#10 0x00007fe78a6db60b in getJNIEnv () from
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#11 0x00007fe78a6dd034 in hdfsBuilderConnect () from
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#12 0x0000000000400950 in main ()
{code}
As you can see it happens in setTLSExceptionStrings () so definitely related to
this patch.
I can upload a hs_err*.log log file if it will be helpful.
> getGlobalJNIEnv() may deadlock if exception is thrown
> -----------------------------------------------------
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: libhdfs
> Affects Versions: 3.0.0-alpha4
> Reporter: Henry Robinson
> Assignee: Sailesh Mukil
> Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch,
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch,
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but
> {{printExceptionAndFree()}} will eventually try to acquire that lock in
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
> "org/apache/hadoop/fs/FileSystem",
> "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL,
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API
> in Impala, which uses {{libhdfs}}:
> {code}
> #0 __lll_lock_wait () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x00007ffff4a8d657 in _L_lock_909 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00007ffff4a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960
> <jvmMutex>) at ../nptl/pthread_mutex_lock.c:79
> #3 0x0000000002f06056 in mutexLock (m=<optimized out>) at
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4 0x0000000002efe817 in setTLSExceptionStrings (rootCause=0x0,
> stackTrace=0x0) at
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5 0x0000000002f065d7 in printExceptionAndFreeV (env=0x513c1e8,
> exc=0x508a8c0, noPrintFlags=<optimized out>, fmt=0x34349cf "loadFileSystems",
> ap=0x7fffffffb660)
> at
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6 0x0000000002f0683d in printExceptionAndFree (env=<optimized out>,
> exc=<optimized out>, noPrintFlags=<optimized out>, fmt=<optimized out>)
> at
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7 0x0000000002eff60f in getGlobalJNIEnv () at
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]