[ https://issues.apache.org/jira/browse/HDFS-16084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870058#comment-17870058 ]
ASF GitHub Bot commented on HDFS-16084: --------------------------------------- kevincai opened a new pull request, #6969: URL: https://github.com/apache/hadoop/pull/6969 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR the jni state tls object is set before passing all the checks, resulting an invalid state in the tls and been hit later on with invalid object which already destroyed. ### How was this patch tested? set the tls object lastly when passing all checks. A sample binary is added to expose this issue and verify the fix. ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > getJNIEnv() returns invalid pointer when called twice after getGlobalJNIEnv() > failed > ------------------------------------------------------------------------------------ > > Key: HDFS-16084 > URL: https://issues.apache.org/jira/browse/HDFS-16084 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs > Affects Versions: 3.2.1 > Reporter: Antoine Pitrou > Priority: Major > > First reported in ARROW-13011: when a libhdfs API call fails because > CLASSPATH isn't set, calling the API a second time leads to a crash. > *Backtrace* > This was obtained from the ARROW-13011 reproducer: > {code:java} > #0 globalClassReference (className=className@entry=0x7f75883c13b0 > "org/apache/hadoop/conf/Configuration", env=env@entry=0x6c2f2f3a73666468, > out=out@entry=0x7fffd86e3020) at > /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:279 > #1 0x00007f75883b9511 in constructNewObjectOfClass > (env=env@entry=0x6c2f2f3a73666468, out=out@entry=0x7fffd86e3148, > className=className@entry=0x7f75883c13b0 > "org/apache/hadoop/conf/Configuration", > ctorSignature=ctorSignature@entry=0x7f75883c1180 "()V") > at > /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:212 > #2 0x00007f75883bb6d0 in hdfsBuilderConnect (bld=0x5562e4bbb3e0) at > /build/source/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:700 > #3 0x00007f758de31ef3 in arrow::io::internal::LibHdfsShim::BuilderConnect > (this=0x7f758e768240 <arrow::io::internal::(anonymous > namespace)::libhdfs_shim>, > bld=0x5562e4bbb3e0) at /arrow/cpp/src/arrow/io/hdfs_internal.cc:366 > #4 0x00007f758de2d098 in > arrow::io::HadoopFileSystem::HadoopFileSystemImpl::Connect > (this=0x5562e4a9f750, config=0x5562e46edc30) > at /arrow/cpp/src/arrow/io/hdfs.cc:372 > #5 0x00007f758de2e646 in arrow::io::HadoopFileSystem::Connect > (config=0x5562e46edc30, fs=0x5562e46edd08) at > /arrow/cpp/src/arrow/io/hdfs.cc:590 > #6 0x00007f758d532d2a in arrow::fs::HadoopFileSystem::Impl::Init > (this=0x5562e46edc30) at /arrow/cpp/src/arrow/filesystem/hdfs.cc:59 > #7 0x00007f758d536931 in arrow::fs::HadoopFileSystem::Make (options=..., > io_context=...) at /arrow/cpp/src/arrow/filesystem/hdfs.cc:409 > #8 0x00007f75885d7445 in > __pyx_pf_7pyarrow_5_hdfs_16HadoopFileSystem___init__ > (__pyx_v_self=0x7f758871a970, __pyx_v_host=0x7f758871cc00, __pyx_v_port=8020, > __pyx_v_user=0x5562e3af6d30 <_Py_NoneStruct>, __pyx_v_replication=3, > __pyx_v_buffer_size=0, __pyx_v_default_block_size=0x5562e3af6d30 > <_Py_NoneStruct>, > __pyx_v_kerb_ticket=0x5562e3af6d30 <_Py_NoneStruct>, > __pyx_v_extra_conf=0x5562e3af6d30 <_Py_NoneStruct>) at _hdfs.cpp:4759 > #9 0x00007f75885d4c88 in > __pyx_pw_7pyarrow_5_hdfs_16HadoopFileSystem_1__init__ > (__pyx_v_self=0x7f758871a970, __pyx_args=0x7f75900bb048, > __pyx_kwds=0x7f7590033a68) > at _hdfs.cpp:4343 > #10 0x00005562e38ca747 in type_call () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Objects/typeobject.c:915 > #11 0x00005562e39117a3 in _PyObject_FastCallDict (kwargs=<optimized out>, > nargs=<optimized out>, args=<optimized out>, > func=0x7f75885f1420 <__pyx_type_7pyarrow_5_hdfs_HadoopFileSystem>) at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Objects/tupleobject.c:76 > #12 _PyObject_FastCallKeywords () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Objects/abstract.c:2496 > #13 0x00005562e39121d5 in call_function () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4875 > #14 0x00005562e3973d68 in _PyEval_EvalFrameDefault () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:3351 > #15 0x00005562e38b98f5 in PyEval_EvalFrameEx (throwflag=0, f=0x7f74c0664768) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4166 > #16 _PyEval_EvalCodeWithName () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4166 > #17 0x00005562e38bad79 in PyEval_EvalCodeEx (_co=<optimized out>, > globals=<optimized out>, locals=<optimized out>, args=<optimized out>, > argcount=<optimized out>, > kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, > closure=0x0) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:4187 > #18 0x00005562e398b6eb in PyEval_EvalCode (co=<optimized out>, > globals=<optimized out>, locals=<optimized out>) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/ceval.c:731 > #19 0x00005562e39f30e3 in run_mod () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:1025 > #20 0x00005562e3896dd3 in PyRun_InteractiveOneObjectEx (fp=0x7f758f30aa00 > <_IO_2_1_stdin_>, filename=0x7f75900391b8, flags=0x7fffd86e40bc) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:246 > #21 0x00005562e3896f85 in PyRun_InteractiveLoopFlags (fp=0x7f758f30aa00 > <_IO_2_1_stdin_>, filename_str=<optimized out>, flags=0x7fffd86e40bc) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:114 > #22 0x00005562e3897024 in PyRun_AnyFileExFlags (fp=0x7f758f30aa00 > <_IO_2_1_stdin_>, filename=0x5562e3a32ee6 "<stdin>", closeit=0, > flags=0x7fffd86e40bc) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Python/pythonrun.c:75 > #23 0x00005562e39f8cc7 in run_file (p_cf=0x7fffd86e40bc, filename=<optimized > out>, fp=0x7f758f30aa00 <_IO_2_1_stdin_>) > at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Modules/main.c:340 > #24 Py_Main () at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Modules/main.c:810 > #25 0x00005562e389bf77 in main (argc=1, argv=0x7fffd86e42c8) at > /home/conda/feedstock_root/build_artifacts/python_1613711361059/work/Programs/python.c:69 > {code} > *Analysis* > The first time {{getJNIEnv()}} is called, no thread-local state is registered > yet. It therefore starts by doing three initialization steps: > 1) allocate a new {{ThreadLocalState}} structure on the heap > 2) associate a POSIX thread-local state to the {{ThreadLocalState}} pointer > 3) associate a native ({{_thread}}) shortcut to the {{ThreadLocalState}} > pointer > Then {{getGlobalJNIEnv()}} is called to actually fetch a valid JNI > environment pointer. However, this call may fail (e.g. CLASSPATH not set > properly). Then the following happens: > 1) the {{ThreadLocalState}} is deallocated from the heap > 2) and... that's all! > Neither the POSIX thread-local-state nor the native {{__thread}} shortcut are > reinitialized. They still hold the {{ThreadLocalState}} pointer, but the > corresponding memory was freed and returned to the allocator. > The next time the user tries to call a libhdfs API again, {{getJNIEnv()}} > returns successfully... with an invalid pointer (or pointing to random data). > For example: > {code} > (gdb) p getJNIEnv() > $2 = (JNIEnv *) 0x6c2f2f3a73666468 > (gdb) p *getJNIEnv() > Cannot access memory at address 0x6c2f2f3a73666468 > {code} > (0x6c2f2f3a73666468 is the little-endian representation of the string > "hdfs://l") > *Note* > This analysis was done with Hadoop 3.2.1. However, examination of the 3.3.2 > or trunk source code seems to show that {{getJNIEnv()}} hasn't changed > in-between. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org