howardli9175 opened a new issue, #1893: URL: https://github.com/apache/auron/issues/1893
1. Environment 6 worker node YARN cluster, x86 architecture, each node with 64 cores and 500GB memory. Hadoop 3.2.2 Spark 3.5.4 Blaze 5.0.0 2. how to reproduce Running TPC-DS benchmark, 10TB dataset, Parquet + ZSTD compression. ``` spark.executor.cores=1 spark.executor.memory=16g spark.executor.memoryOverhead=16g spark.driver.cores=1 spark.driver.memory=20g spark.blaze.enable true spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager spark.memory.offHeap.enabled false ``` Queries q24a and q24b failed. The error message is as shown in the figure. The failure can be reproduced every time. The failed stage has 200 tasks, with 164 succeeded and 36 failed. 3. other scenario where the queries succeed On 10TB dataset, without Blaze enabled, the queries succeed. On the 1TB dataset, with Blaze enabled, the queries also succeed . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
