acruise opened a new issue, #7200:
URL: https://github.com/apache/incubator-gluten/issues/7200
### Backend
VL (Velox)
### Bug description
I have a TPC-DS dataset in ORC format on S3. On vanilla Spark 3.5.1 on a
single node, this query completes in 1-3 seconds:
```
val customers =
spark.read.orc("s3a://mybucket/datasets/tpcds_sf100.orc/customer/*.orc").toDF
customers.count()
```
With Gluten enabled (built from the 1.2.0 tag, with S3 enabled),
initializing the DataFrame is fine, but when I invoke `count()` the expected
number of tasks is spawned, but they do nothing at all.
I've tried disabling whole-stage codegen, but it makes no difference.
### Spark version
Spark-3.5.x
### Spark configurations
```
/opt/spark/bin/spark-shell \
--jars
/home/alex/incubator-gluten/package/target/gluten-velox-bundle-spark3.5_2.12-ubuntu_22.04_x86_64-1.2.0.jar
\
--packages org.apache.hadoop:hadoop-aws:3.3.4 \
-c
spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
\
-c spark.plugins=org.apache.gluten.GlutenPlugin \
-c spark.memory.offHeap.enabled=true \
-c spark.memory.offHeap.size=32G \
-c spark.driver.memory=8g \
-c spark.executor.memory=16g
```
### System information
No such script in v1.2.0 :)
It's a c5a.8xlarge (64GB, 32 cores, >100G local disk)
### Relevant logs
```bash
scala> customers.count
24/09/11 22:36:32 INFO FileSourceStrategy: Pushed Filters:
24/09/11 22:36:32 INFO FileSourceStrategy: Post-Scan Filters:
24/09/11 22:36:32 INFO GlutenFallbackReporter: Validation failed for plan:
Exchange[QueryId=0], due to: [FallbackByBackendSettings] Validation failed on
node Exchange.
24/09/11 22:36:32 INFO InputPartitionsUtil: Planning scan with bin packing,
max size: 6213148 bytes, open cost is considered as scanning 4194304 bytes.
24/09/11 22:36:32 INFO DAGScheduler: Registering RDD 5 (count at
<console>:27) as input to shuffle 0
24/09/11 22:36:32 INFO DAGScheduler: Got map stage job 1 (count at
<console>:27) with 22 output partitions
24/09/11 22:36:32 INFO DAGScheduler: Final stage: ShuffleMapStage 1 (count
at <console>:27)
24/09/11 22:36:32 INFO DAGScheduler: Parents of final stage: List()
24/09/11 22:36:32 INFO DAGScheduler: Missing parents: List()
24/09/11 22:36:32 INFO DAGScheduler: Submitting ShuffleMapStage 1
(MapPartitionsRDD[5] at count at <console>:27), which has no missing parents
24/09/11 22:36:32 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 32.9 KiB, free 36.6 GiB)
24/09/11 22:36:32 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
in memory (estimated size 14.6 KiB, free 36.6 GiB)
24/09/11 22:36:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on ip-172-31-0-251.us-west-1.compute.internal:38299 (size: 14.6 KiB, free: 36.6
GiB)
24/09/11 22:36:32 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:1585
24/09/11 22:36:32 INFO DAGScheduler: Submitting 22 missing tasks from
ShuffleMapStage 1 (MapPartitionsRDD[5] at count at <console>:27) (first 15
tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14))
24/09/11 22:36:32 INFO TaskSchedulerImpl: Adding task set 1.0 with 22 tasks
resource profile 0
24/09/11 22:36:32 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
1) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 0,
PROCESS_LOCAL, 9202 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
2) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 1,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID
3) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 2,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID
4) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 3,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID
5) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 4,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 5.0 in stage 1.0 (TID
6) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 5,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 6.0 in stage 1.0 (TID
7) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 6,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 7.0 in stage 1.0 (TID
8) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 7,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 8.0 in stage 1.0 (TID
9) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 8,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 9.0 in stage 1.0 (TID
10) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 9,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 10.0 in stage 1.0 (TID
11) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 10,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 11.0 in stage 1.0 (TID
12) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 11,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 12.0 in stage 1.0 (TID
13) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 12,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 13.0 in stage 1.0 (TID
14) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 13,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 14.0 in stage 1.0 (TID
15) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 14,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 15.0 in stage 1.0 (TID
16) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 15,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 16.0 in stage 1.0 (TID
17) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 16,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 17.0 in stage 1.0 (TID
18) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 17,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 18.0 in stage 1.0 (TID
19) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 18,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 19.0 in stage 1.0 (TID
20) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 19,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 20.0 in stage 1.0 (TID
21) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 20,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO TaskSetManager: Starting task 21.0 in stage 1.0 (TID
22) (ip-172-31-0-251.us-west-1.compute.internal, executor driver, partition 21,
PROCESS_LOCAL, 9204 bytes)
24/09/11 22:36:32 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
24/09/11 22:36:32 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
24/09/11 22:36:32 INFO Executor: Running task 2.0 in stage 1.0 (TID 3)
24/09/11 22:36:32 INFO Executor: Running task 3.0 in stage 1.0 (TID 4)
24/09/11 22:36:32 INFO Executor: Running task 4.0 in stage 1.0 (TID 5)
24/09/11 22:36:32 INFO Executor: Running task 5.0 in stage 1.0 (TID 6)
24/09/11 22:36:32 INFO Executor: Running task 6.0 in stage 1.0 (TID 7)
24/09/11 22:36:32 INFO Executor: Running task 7.0 in stage 1.0 (TID 8)
24/09/11 22:36:32 INFO Executor: Running task 8.0 in stage 1.0 (TID 9)
24/09/11 22:36:32 INFO Executor: Running task 9.0 in stage 1.0 (TID 10)
24/09/11 22:36:32 INFO Executor: Running task 10.0 in stage 1.0 (TID 11)
24/09/11 22:36:32 INFO Executor: Running task 11.0 in stage 1.0 (TID 12)
24/09/11 22:36:32 INFO Executor: Running task 12.0 in stage 1.0 (TID 13)
24/09/11 22:36:32 INFO Executor: Running task 13.0 in stage 1.0 (TID 14)
24/09/11 22:36:32 INFO Executor: Running task 14.0 in stage 1.0 (TID 15)
24/09/11 22:36:32 INFO Executor: Running task 15.0 in stage 1.0 (TID 16)
24/09/11 22:36:32 INFO Executor: Running task 16.0 in stage 1.0 (TID 17)
24/09/11 22:36:32 INFO Executor: Running task 17.0 in stage 1.0 (TID 18)
24/09/11 22:36:32 INFO Executor: Running task 18.0 in stage 1.0 (TID 19)
24/09/11 22:36:32 INFO Executor: Running task 19.0 in stage 1.0 (TID 20)
24/09/11 22:36:32 INFO Executor: Running task 20.0 in stage 1.0 (TID 21)
24/09/11 22:36:32 INFO Executor: Running task 21.0 in stage 1.0 (TID 22)
[Stage 0:> (0 + 1) / 1][Stage 1:> (0 + 22) /
22]
```
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]