GitHub user shadowmmu edited a discussion: [Question] TPC-DS 1TB Benchmarking 
results for Non-Partitioned Delta tables with Velox Backend

Hi Gluten Community,

I am currently exploring the performance of Apache Gluten with the **Velox 
backend** specifically for **Delta Lake** workloads.

While there are several TPC-DS benchmark reports available for Parquet/ORC, I 
am looking for insights or existing benchmarking results for the following 
specific setup:

* **Scale Factor:** 1TB (TPC-DS)
* **Data Format:** Delta Lake (non-partitioned)
* **Backend:** Velox
* **Storage:** GCS

**Context:**
We are evaluating the overhead of the Delta Log reading process versus the 
native acceleration provided by Velox. Specifically, we are interested in:

1. How **non-partitioned** Delta tables perform compared to standard Parquet in 
a Gluten environment.
2. If anyone has observed specific bottlenecks in metadata handling or scan 
performance with this configuration.
3. Recommended Spark/Gluten configurations to optimize the Delta-Velox scan 
path for large-scale non-partitioned data.

If anyone has run these benchmarks or has a performance comparison (Native 
Spark vs. Gluten+Velox) for this setup, I would greatly appreciate it if you 
could share your findings or any tuning tips!

Thanks!


GitHub link: https://github.com/apache/incubator-gluten/discussions/11463

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to