[
https://issues.apache.org/jira/browse/HDFS-16021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029943#comment-18029943
]
lifulong edited comment on HDFS-16021 at 10/15/25 2:04 AM:
-----------------------------------------------------------
hi [~jeremy.coulon] , Glad to find this issue—we're experiencing the same
problem. The Java process using libhdfs crashes in the {{hdfsThreadDestructor}}
function with a core dump. The {{env *env}} is not null, and we suspected it
might be related to JVM shutdown, but there's no clear evidence since logs show
other threads were still processing data normally at the time
But why hasn't anyone reviewed this patch yet?
was (Author: lifulong):
hi [~jeremy.coulon] , Glad to find this issue—we're experiencing the same
problem. The Java process using libhdfs crashes in the {{hdfsThreadDestructor}}
function with a core dump. The {{env *env}} is not null, and we suspected it
might be related to JVM shutdown, but there's no clear evidence since logs show
other threads were still processing data normally at the time
> heap-use-after-free in hdfsThreadDestructor
> -------------------------------------------
>
> Key: HDFS-16021
> URL: https://issues.apache.org/jira/browse/HDFS-16021
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.9.1, 2.9.2, 3.3.0
> Reporter: Jeremy Coulon
> Priority: Major
> Attachments: fix-hdfsThreadDestructor.patch, hdfs-asan.log
>
>
> Related to HDFS-12628 HDFS-13585 HDFS-14488 HDFS-15270
>
> We have experienced crashes located in libhdfs hdfsThreadDestructor() for a
> long time. Crash is almost systematic with OpenJ9 and more sporadic with
> Hotspot JVM.
>
> I finally went to the root cause of this bug thanks to AddressSanitizer. This
> is quite difficult to setup because you need to rebuild both the test-case,
> hadoop and openjdk-hotspot >= 13 with specific compiler options.
>
> See hdfs-asan.log for details.
>
> *Analysis:*
> In hdfsThreadDestructor(), you are making several JNI calls in order to
> detach the thread from the JVM:
>
> {code:java}
> /* Detach the current thread from the JVM */
> if (env) {
> ret = (*env)->GetJavaVM(env, &vm);
> /*
> * More code here...
> */
> }{code}
> This is fine if the thread was created in the C/C++ world.
>
> However if the thread was created in the Java world, this is absolutely
> wrong. When a Java thread terminates, the JVM deallocates some memory which
> contains (among other things) the thread specific JNIEnv. Then
> hdfsThreadDestructor() is called. The *env* variable is not NULL but points
> to memory which was just released. This is heap-use-after-free detected by
> ASan.
>
> I have been working on a patch that fixes the issue (see attachment).
>
> Here is the idea:
> * In hdfsThreadDestructor(), we need to know if the thread was created by
> Java or C/C++ . If it was created by C/C++ we should make JNI calls in order
> to detach the current thread. If it was created by Java, we don't need to
> make any JNI call: thread is already detached.
> * In getGlobalJNIEnv(), we can detect if the thread was created by Java or
> C/C++. It can be done by calling *vm->GetEnv()*. Then we store this
> information inside ThreadLocalState.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]