zmdaodao opened a new issue, #8854:
URL: https://github.com/apache/incubator-gluten/issues/8854
### Backend
VL (Velox)
### Bug description
[Environment configuration]
spark3.4.4
gluten (main and v1.3)
backend velox
hadoop:3.3.3
jdk: (open jdk)1.8.0_441 and (bisheng jdk)1.8.0_352
[how do]
run tpcds data test, execute the sql “select * from date_dim limit 10” ,it's
show the result, when exit spark (ctrl + c or quit) is error.
run this cmd
spark-sql --properties-file $SPARK_HOME/conf/spark-defaults.conf_used
--master local
[Expected behavior]
when used openjdk 1.8.0_441 , exit spark , log show :"hdfs disconnect
failure in HdfsReadFile close:"
when used bisheng jdk, exit spark ,is coredump.
**### use openjdk logs:**
25/02/26 13:42:50 INFO SparkContext: Successfully stopped SparkContext
I20250226 13:42:50.562902 1967003 EventBase.cpp:810] EventBase(): Received
terminateLoopSoon() command.
I20250226 13:42:50.563019 1966988 EventBase.cpp:794] EventBase
0xfff028008cc0 virtual void folly::EventBase::bumpHandlingTime() (loop) latest
18446744073709551578 next 18446744073709551579
I20250226 13:42:50.563082 1966988 EventBase.cpp:1058] latest
18446744073709551578 next 18446744073709551579
I20250226 13:42:50.563097 1966988 EventBase.cpp:801] EventBase
0xfff028008cc0 virtual void folly::EventBase::bumpHandlingTime() (loop)
startWork_ 39996343990598151
I20250226 13:42:50.563117 1966988 EventBase.cpp:794] EventBase
0xfff028008cc0 virtual void folly::EventBase::bumpHandlingTime() (loop) latest
18446744073709551579 next 18446744073709551579
I20250226 13:42:50.563129 1966988 EventBase.cpp:1058] latest
18446744073709551579 next 18446744073709551579
I20250226 13:42:50.563169 1966988 EventBase.cpp:1058] latest
18446744073709551579 next 18446744073709551579
I20250226 13:42:50.563146 1966988 EventBase.cpp:655] EventBase
0xfff028008cc0 did not timeout loop time guess: 3470412 idle time: 3470369
busy time: 43 avgLoopTime: 65.9659 maxLatencyLoopTime: 65.9659 maxLatency_: 0us
notificationQueueSize: 0 nothingHandledYet(): 0
I20250226 13:42:50.563189 1966988 EventBase.cpp:686] EventBase
0xfff028008cc0 loop time: 3470
I20250226 13:42:50.563200 1966988 EventBase.cpp:709] EventBase(): Done with
loop.
I20250226 13:42:50.563225 1966988 EventBase.cpp:794] EventBase
0xfff028008cc0 virtual void folly::EventBase::bumpHandlingTime() (loop) latest
18446744073709551579 next 18446744073709551579
I20250226 13:42:50.563235 1966988 EventBase.cpp:1058] latest
18446744073709551579 next 18446744073709551579
I20250226 13:42:50.563261 1966988 EventBase.cpp:374] EventBase(): Destroyed.
25/02/26 13:42:50 INFO ShutdownHookManager: Shutdown hook called
25/02/26 13:42:50 INFO ShutdownHookManager: Deleting directory
/tmp/spark-d8abde0a-5264-4e18-91ea-1f0ba86f4ab1
25/02/26 13:42:50 INFO ShutdownHookManager: Deleting directory
/tmp/spark-3bf78d2e-2fae-4dc1-bc8a-3f28a9c769e0
25/02/26 13:42:50 INFO ShutdownHookManager: Deleting directory
/tmp/spark-d563ea5c-ac87-41b3-b731-46a3d4bd225d
I20250226 13:42:50.916486 1966758 HdfsFileSystem.cpp:71] Disconnecting HDFS
file system
Call to AttachCurrentThread failed with error: -1
getJNIEnv: getGlobalJNIEnv failed
W20250226 13:42:50.916615 1966758 HdfsFileSystem.cpp:81] hdfs disconnect
failure in HdfsReadFile close: 255
**### use bisheng jdk logs:**
25/02/26 13:46:53 INFO SparkContext: Successfully stopped SparkContext
I20250226 13:46:53.624395 1967784 EventBase.cpp:810] EventBase(): Received
terminateLoopSoon() command.
I20250226 13:46:53.624492 1967774 EventBase.cpp:794] EventBase
0xffefc8001340 virtual void folly::EventBase::bumpHandlingTime() (loop) latest
18446744073709551578 next 18446744073709551579
I20250226 13:46:53.624589 1967774 EventBase.cpp:1058] latest
18446744073709551578 next 18446744073709551579
I20250226 13:46:53.624603 1967774 EventBase.cpp:801] EventBase
0xffefc8001340 virtual void folly::EventBase::bumpHandlingTime() (loop)
startWork_ 39996587052105286
I20250226 13:46:53.624621 1967774 EventBase.cpp:794] EventBase
0xffefc8001340 virtual void folly::EventBase::bumpHandlingTime() (loop) latest
18446744073709551579 next 18446744073709551579
I20250226 13:46:53.624634 1967774 EventBase.cpp:1058] latest
18446744073709551579 next 18446744073709551579
I20250226 13:46:53.624663 1967774 EventBase.cpp:1058] latest
18446744073709551579 next 18446744073709551579
I20250226 13:46:53.624645 1967774 EventBase.cpp:655] EventBase
0xffefc8001340 did not timeout loop time guess: 2824868 idle time: 2824827
busy time: 41 avgLoopTime: -15790 maxLatencyLoopTime: -15790 maxLatency_: 0us
notificationQueueSize: 0 nothingHandledYet(): 0
I20250226 13:46:53.624680 1967774 EventBase.cpp:686] EventBase
0xffefc8001340 loop time: 2824
I20250226 13:46:53.624689 1967774 EventBase.cpp:709] EventBase(): Done with
loop.
I20250226 13:46:53.624713 1967774 EventBase.cpp:794] EventBase
0xffefc8001340 virtual void folly::EventBase::bumpHandlingTime() (loop) latest
18446744073709551579 next 18446744073709551579
I20250226 13:46:53.624725 1967774 EventBase.cpp:1058] latest
18446744073709551579 next 18446744073709551579
I20250226 13:46:53.624748 1967774 EventBase.cpp:374] EventBase(): Destroyed.
25/02/26 13:46:53 INFO ShutdownHookManager: Shutdown hook called
25/02/26 13:46:53 INFO ShutdownHookManager: Deleting directory
/tmp/spark-d782fc63-ef7c-4565-ab1c-72c2cfe5f016
25/02/26 13:46:53 INFO ShutdownHookManager: Deleting directory
/tmp/spark-c04cad6e-534f-49e8-aee4-d0a3affca857
25/02/26 13:46:53 INFO ShutdownHookManager: Deleting directory
/tmp/spark-494937b8-f21a-401f-9595-709c1e9bf4dc
I20250226 13:46:53.972867 1967630 HdfsFileSystem.cpp:71] Disconnecting HDFS
file system
Aborted
coredump:
Stack: [0x0000fff1152a0000,0x0000fff1154a0000], sp=0x0000fff11549d760,
free space=2037k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
C [libhdfs.so+0x3568] methodIdFromClass+0xc8
C [libhdfs.so+0x35d0] invokeMethodOnJclass+0x50
C [libhdfs.so+0x3bcc] invokeMethod+0xcc
C [libhdfs.so+0x6844] hdfsDisconnect+0x44
C [libvelox.so+0xa55f604]
facebook::velox::filesystems::arrow::io::internal::LibHdfsShim::Disconnect(hdfs_internal*)+0x20
C [libvelox.so+0xa34459c]
facebook::velox::filesystems::HdfsFileSystem::Impl::~Impl()+0x68
C [libvelox.so+0xa346018] void
std::_Destroy<facebook::velox::filesystems::HdfsFileSystem::Impl>(facebook::velox::filesystems::HdfsFileSystem::Impl*)+0x14
[Analysis]
I thought the jvm has been free when execute hdfs disconnect.
from the log ,I find “getJNIEnv: getGlobalJNIEnv failed” and “25/02/26
13:42:50 INFO ShutdownHookManager: Shutdown hook called”
from spark code:
// Add a shutdown hook to delete the temp dirs when the JVM exits
logDebug("Adding shutdown hook") // force eager creation of logger
addShutdownHook(TEMP_DIR_SHUTDOWN_PRIORITY) { () =>
### Spark version
Spark-3.4.x
### Spark configurations
spark.plugins=org.apache.gluten.GlutenPlugin
spark.driver.extraClassPath=/gluten_jars/gluten-velox-bundle-spark3.4_2.12.jar
spark.executor.extraClassPath=gluten-velox-bundle-spark3.4_2.12.jar
spark.jars=/gluten_jars/gluten-velox-bundle-spark3.4_2.12.jar
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.loadLibFromJar=false
spark.gluten.sql.debug=true
spark.gluten.sql.columnar.backend.lib=velox
spark.gluten.sql.columnar.backend.velox.memoryCapRatio=0.75
spark.gluten.sql.columnar.logicalJoinOptimizeEnable=true
spark.gluten.sql.columnar.logicalJoinOptimizationLevel=19
### System information
_No response_
### Relevant logs
```bash
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]