xingnailu opened a new issue, #5875:
URL: https://github.com/apache/incubator-gluten/issues/5875
### Backend
VL (Velox)
### Bug description
I am using Gluten(tag v1.1.1) + Velox + folly + spark3.4.2 + yarn,building
with centos8 aarch64, running on aarch64 , while yarn container running with
reading s3 data, throw Assertion failure: hp.second == srcChunk->tag(srcI)
### Spark version
None
### Spark configurations
spark.app.attempt.id | 1
-- | --
spark.app.id | application_1693383838041_3359264
spark.app.name | xxx
spark.app.startTime | 1716774272681
spark.app.submitTime | 1716774261696
spark.celeborn.master.endpoints | cem-0.cem.bigdata.svc.cluster.local:9097
spark.compact.default.filesystem | hdfs:/xxxx
spark.compact.smallfile.amount | 1000
spark.default.parallelism | 800
spark.driver.cores | 1
spark.driver.extraJavaOptions | -Djava.net.preferIPv6Addresses=false
-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false -Ddubbo.application.qos.enable=false
-Duser.timeZone=GMT+08 -Dcom.amazonaws.services.s3.enableV4=true -Djav
a.net.preferIPv4Stack=true -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m
-XX:MaxDirectMemorySize=2g -XX:+UseCompressedOops -XX:ParallelGCThreads=8
-XX:ConcGCThreads=4 -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=0
-XX:OnOutOfMemoryError="kill -9 %p" -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
-Xloggc:<LOG_DIR>/gc.log -XX:MaxDirectMemorySize=2048m
spark.driver.host | 100-64-115-176.bigdata.pod.cluster.local
spark.driver.memory | 3g
spark.driver.memoryOverhead | 2G
spark.driver.port | 44823
spark.dynamicAllocation.enabled | false
spark.dynamicAllocation.initialExecutors | 0
spark.dynamicAllocation.maxExecutors | 200
spark.dynamicAllocation.minExecutors | 0
spark.dynamicAllocation.schedulerBacklogTimeout | 10s
spark.eventLog.dir | s3a://xxx/igdata-sparkhistoryserver/jhs/
spark.eventLog.enabled | true
spark.executor.cores | 4
spark.executor.extraJavaOptions | -Djava.net.preferIPv6Addresses=false
-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false -Ddubbo.application.qos.enable=false
-Duser.timeZone=GMT+08 -Dcom.amazonaws.services.s3.enableV4=true -Dj
ava.net.preferIPv4Stack=true -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m
-XX:MaxDirectMemorySize=1g -XX:+UseCompressedOops -XX:ParallelGCThreads=8
-XX:ConcGCThreads=4 -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=0
-XX:OnOutOfMemoryError="kill -9 %p" -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
-Xloggc:<LOG_DIR>/gc.log -XX:MaxDirectMemorySize=3686m
spark.executor.heartbeatInterval | 60s
spark.executor.id | driver
spark.executor.instances | 10
spark.executor.memory | 4g
spark.executor.memoryOverhead | 4G
spark.executorEnv.PYTHONPATH |
{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.9.7-src.zip<CPS>{{PWD}}/pyspark-3.1.2-20230509.zip<CPS>{{PWD}}/py4j-0.10.9-src.zip<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.9-src.zip
spark.gluten.loadLibFromJar | true
spark.gluten.memory.conservative.task.offHeap.size.in.bytes | 402653184
spark.gluten.memory.offHeap.size.in.bytes | 3221225472
spark.gluten.memory.task.offHeap.size.in.bytes | 805306368
spark.gluten.sql.session.timeZone.default | UTC
spark.hadoop.fs.s3.access.key | *********(redacted)
spark.hadoop.fs.s3.connection.ssl.enabled | false
spark.hadoop.fs.s3.endpoint | s3.ap-southeast-1.amazonaws.com
spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds | 2000
spark.hadoop.fs.s3.impl | org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3.path.style.access | false
spark.hadoop.fs.s3.secret.key | *********(redacted)
spark.hadoop.fs.s3a.access.key | *********(redacted)
spark.hadoop.fs.s3a.aws.credentials.provider |
org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3a.connection.ssl.enabled | false
spark.hadoop.fs.s3a.endpoint | s3.ap-southeast-1.amazonaws.com
spark.hadoop.fs.s3a.impl | org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access | false
spark.hadoop.fs.s3a.secret.key | *********(redacted)
spark.hadoop.fs.s3n.access.key | *********(redacted)
spark.hadoop.fs.s3n.connection.ssl.enabled | false
spark.hadoop.fs.s3n.endpoint | s3.ap-southeast-1.amazonaws.com
spark.hadoop.fs.s3n.impl | org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3n.path.style.access | false
spark.hadoop.fs.s3n.secret.key | *********(redacted)
spark.hadoop.hive.exec.dynamic.partition.mode | nonstrict
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version | 2
spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored | true
spark.hadoop.mapreduce.input.fileinputformat.split.minsize | 268435456
spark.hadoop.orc.overwrite.output.file | true
spark.history.fs.logDirectory |
s3a:/xxxx/yarn-eks/bigdata-sparkhistoryserver/jhs/
spark.kryoserializer.buffer.max | 128m
spark.livy.owner | wireless
spark.livy.spark_major_version | 3
spark.locality.wait | 0s
spark.master | yarn
spark.maxRemoteBlockSizeFetchToMem | 512m
spark.memory.offHeap.enabled | true
spark.memory.offHeap.size | 3g
spark.network.timeout | 120s
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS
|
yarnrm1b-0.yarnrm.bigdata.svc.cluster.local,yarnrm1b-1.yarnrm.bigdata.svc.cluster.local
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES
|
http://yarnrm1b-0.yarnrm.bigdata.svc.cluster.local:8088/proxy/application_1693383838041_3359264,http://yarnrm1b-1.yarnrm.bigdata.svc.cluster.local:8088/proxy/application_1693383838041_3359264
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.RM_HA_URLS
|
yarnrm1b-0.yarnrm.bigdata.svc.cluster.local:8088,yarnrm1b-1.yarnrm.bigdata.svc.cluster.local:8088
spark.plugins | io.glutenproject.GlutenPlugin
spark.reducer.maxBlocksInFlightPerAddress | 1000
spark.reducer.maxReqsInFlight | 1000
spark.repl.class.outputDir |
/data/data1/yarn/nm/usercache/hive/appcache/application_1693383838041_3359264/container_e32_1693383838041_3359264_01_000001/tmp/spark7852604390862141239
spark.repl.class.uri | spark://xxx:44823/classes
spark.scheduler.mode | FIFO
spark.serializer | org.apache.spark.serializer.KryoSerializer
spark.shuffle.consolidateFiles | true
spark.shuffle.io.maxRetries | 5
spark.shuffle.io.retryWait | 10
spark.shuffle.manager |
org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
spark.shuffle.registration.maxAttempts | 5
spark.shuffle.registration.timeout | 120000
spark.shuffle.service.enabled | false
spark.shuffle.useOldFetchProtocol | true
spark.speculation | true
spark.speculation.interval | 10000
spark.speculation.quantile | 0.95
spark.sql.adaptive.advisoryPartitionSizeInBytes | 134217728
spark.sql.adaptive.enabled | true
spark.sql.adaptive.localShuffleReader.enabled | false
spark.sql.adaptive.shuffle.targetPostShuffleInputSize | 268435456
spark.sql.autoBroadcastJoinThreshold | 20971520
spark.sql.broadcastTimeout | 1200
spark.sql.catalogImplementation | hive
spark.sql.compatible.check.enabled | false
spark.sql.extensions | io.glutenproject.GlutenSessionExtensions
spark.sql.files.maxPartitionBytes | 268435456
spark.sql.files.openCostInBytes | 8388608
spark.sql.files.readParallelism | 1
spark.sql.hive.caseSensitiveInferenceMode | NEVER_INFER
spark.sql.legacy.allowHashOnMapType | true
spark.sql.legacy.timeParserPolicy | LEGACY
spark.sql.mapKeyDedupPolicy | LAST_WIN
spark.sql.orc.compression.codec | zlib
spark.sql.parquet.fs.optimized.committer.optimization-enabled | false
spark.sql.parquet.writeLegacyFormat | true
spark.sql.shuffle.partitions | 400
spark.sql.sources.parallelPartitionDiscovery.parallelism | 10
spark.sql.storeAssignmentPolicy | LEGACY
spark.storage.decommission.enabled | true
spark.storage.decommission.fallbackStorage.path |
s3a://bi.oppo.com/yarn-eks/bigdata-default/
spark.storage.decommission.rddBlocks.enabled | true
spark.storage.decommission.shuffleBlocks.enabled | true
spark.submit.deployMode | cluster
spark.ui.filters | org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.ui.port | 0
spark.yarn.am.extraJavaOptions | -Ddubbo.application.qos.enable=false
-Dcom.amazonaws.services.s3.enableV4=true -Djava.net.preferIPv4Stack=true
-XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=512m -XX:MaxDirectMemorySize=2g
-XX:+UseCompressedOops -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4 -XX:+UseG1GC
-XX:SoftRefLRUPolicyMSPerMB=0 -XX:OnOutOfMemoryError="kill -9 %p" -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC -Xloggc:<LOG_DIR>/gc.log
spark.yarn.am.memory | 2g
spark.yarn.am.waitTime | 600s
spark.yarn.app.container.log.dir |
/data/data1/yarn/container-logs/application_1693383838041_3359264/container_e32_1693383838041_3359264_01_000001
spark.yarn.app.id | application_1693383838041_3359264
spark.yarn.archive |
s3a:/xxxxapp/spark/spark3.4.2-gluten1.1.1-centos8-arm-20240523-1619.zip
spark.yarn.dist.archives |
s3a://xxx/app/spark/sparkr/sparkr-20230509.zip#sparkr
spark.yarn.dist.files |
file:///opt/apache-livy-0.7.1-incubating-SNAPSHOT-bin/conf/yarn/hive-site.xml
spark.yarn.historyServer.address | sparkhs:18080
spark.yarn.isPython | true
spark.yarn.maxAppAttempts | 1
spark.yarn.priority | 4
spark.yarn.queue | root.wireless_sg.daily
spark.yarn.secondary.jars |
AdsUDF.2.0.0.jar,appstore_exposure_udf-1.0.jar,bdp_udf-1.0-SNAPSHOT.jar,hive_udf-1.0-jar-ip.jar,hive_udf-1.0-jar-position.jar,hive_udf-1.0.jar,hive_udfs-1.0.0.jar,universe_cdo_expose_obj_opt_format.jar,GeoIpParse.jar,kryo-shaded-4.0.2.jar,livy-api-0.7.0-incubating-SNAPSHOT.jar,livy-rsc-0.7.0-incubating-SNAPSHOT.jar,livy-thriftserver-session-0.7.0-incubating-SNAPSHOT.jar,minlog-1.3.0.jar,netty-all-4.1.47.Final.jar,objenesis-2.5.1.jar,commons-codec-1.9.jar,livy-client-common-0.7.0-incubating-SNAPSHOT.jar,livy-core_2.12-0.7.0-incubating-SNAPSHOT.jar,livy-repl_2.12-0.7.0-incubating-SNAPSHOT.jar,datanucleus-rdbms-4.1.19.jar,datanucleus-core-4.1.17.jar,datanucleus-api-jdo-4.2.4.jar
spark.yarn.submit.waitAppCompletion | false
### System information
Velox System Info v0.0.2
Commit: 8a935d575aa0a38c3324f7ee98c87b576eb7ad70
CMake Version: 3.20.2
System: Linux-4.18.0-240.10.1.el8_3.aarch64
Arch: aarch64
C++ Compiler: /opt/rh/gcc-toolset-10/root/usr/bin/c++
C++ Compiler Version: 10.3.1
C Compiler: /opt/rh/gcc-toolset-10/root/usr/bin/cc
C Compiler Version: 10.3.1
CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt
\nThe results will be copied to your clipboard if xclip is installed.
### Relevant logs
```bash
[2024-05-27 01:45:46.687]Container exited with a non-zero exit code 134.
Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/bin/bash: line 1: 2322107 Aborted (core dumped)
/usr/lib/jvm/java-1.8.0/bin/java -server -Xmx4096m
'-Djava.net.preferIPv6Addresses=false' '-XX:+IgnoreUnrecognizedVMOptions'
'--add-opens=java.base/java.lang=ALL-UNNAMED'
'--add-opens=java.base/java.lang.invoke=ALL-UNNAMED'
'--add-opens=java.base/java.lang.reflect=ALL-UNNAMED'
'--add-opens=java.base/java.io=ALL-UNNAMED'
'--add-opens=java.base/java.net=ALL-UNNAMED'
'--add-opens=java.base/java.nio=ALL-UNNAMED'
'--add-opens=java.base/java.util=ALL-UNNAMED'
'--add-opens=java.base/java.util.concurrent=ALL-UNNAMED'
'--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED'
'--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED'
'--add-opens=java.base/sun.nio.ch=ALL-UNNAMED'
'--add-opens=java.base/sun.nio.cs=ALL-UNNAMED'
'--add-opens=java.base/sun.security.action=ALL-UNNAMED'
'--add-opens=java.base/sun.util.calendar=ALL-UNNAMED'
'--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'
'-Djdk.reflect.useDirectMethodH
andle=false' '-Ddubbo.application.qos.enable=false' '-Duser.timeZone=GMT+08'
'-Dcom.amazonaws.services.s3.enableV4=true' '-Djava.net.preferIPv4Stack=true'
'-XX:MetaspaceSize=512m' '-XX:MaxMetaspaceSize=512m'
'-XX:MaxDirectMemorySize=1g' '-XX:+UseCompressedOops' '-XX:ParallelGCThreads=8'
'-XX:ConcGCThreads=4' '-XX:+UseG1GC' '-XX:SoftRefLRUPolicyMSPerMB=0'
'-XX:OnOutOfMemoryError=kill -9 %p' '-verbose:gc' '-XX:+PrintGCDetails'
'-XX:+PrintGCTimeStamps' '-XX:+PrintGCDateStamps' '-XX:+PrintHeapAtGC'
'-Xloggc:/data/data1/yarn/container-logs/application_1693383838041_3359264/container_e32_1693383838041_3359264_01_000003/gc.log'
'-XX:MaxDirectMemorySize=3686m'
-Djava.io.tmpdir=/data/data1/yarn/nm/usercache/hive/appcache/application_1693383838041_3359264/container_e32_1693383838041_3359264_01_000003/tmp
'-Dspark.network.timeout=120s' '-Dspark.driver.port=44823' '-Dspark.ui.port=0'
-Dspark.yarn.app.container.log.dir=/data/data1/yarn/container-logs/application_1693383838041_3359264/container_e
32_1693383838041_3359264_01_000003
org.apache.spark.executor.YarnCoarseGrainedExecutorBackend --driver-url
spark://[email protected]:44823
--executor-id 2 --hostname xx.bigdata.pod.cluster.local --cores 4 --app-id
application_1693383838041_3359264 --resourceProfileId 0 >
/data/data1/yarn/container-logs/application_1693383838041_3359264/container_e32_1693383838041_3359264_01_000003/stdout
2>
/data/data1/yarn/container-logs/application_1693383838041_3359264/container_e32_1693383838041_3359264_01_000003/stderr
Last 4096 bytes of stderr :
bytes in memory (estimated size 203.7 KiB, free 5.2 GiB)
24/05/27 01:45:35 INFO TorrentBroadcast: Reading broadcast variable 4 took
81 ms
24/05/27 01:45:35 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 612.5 KiB, free 5.2 GiB)
24/05/27 01:45:36 INFO deprecation: mapred.max.split.size is deprecated.
Instead, use mapreduce.input.fileinputformat.split.maxsize
24/05/27 01:45:39 INFO BaseAllocator: Debug mode disabled. Enable with the
VM option -Darrow.memory.debug.allocator=true.
24/05/27 01:45:39 INFO DefaultAllocationManagerOption: allocation manager
type not specified, using netty as the default type
24/05/27 01:45:39 INFO CheckAllocator: Using DefaultAllocationManager at
memory/DefaultAllocationManagerFactory.class
24/05/27 01:45:40 INFO TorrentBroadcast: Started reading broadcast variable
1 with 1 pieces (estimated total size 4.0 MiB)
24/05/27 01:45:40 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 33.3 KiB, free 5.2 GiB)
24/05/27 01:45:40 INFO TorrentBroadcast: Reading broadcast variable 1 took 9
ms
24/05/27 01:45:40 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 48.1 KiB, free 5.2 GiB)
Assertion failure: hp.second == srcChunk->tag(srcI)
Message:
File: /usr/local/include/folly/container/detail/F14Table.h
Line: 2064
Function: rehashImpl
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]