GitHub user shadowmmu edited a discussion: [Question] TPC-DS 1TB Benchmarking results for Non-Partitioned Delta tables with Velox Backend
Hi Gluten Community, I am currently exploring the performance of Apache Gluten with the **Velox backend** specifically for **Delta Lake** workloads. While there are several TPC-DS benchmark reports available for Parquet/ORC, I am looking for insights or existing benchmarking results for the following specific setup: * **Scale Factor:** 1TB (TPC-DS) * **Data Format:** Delta Lake (non-partitioned) * **Backend:** Velox * **Storage:** GCS **Context:** We are evaluating the overhead of the Delta Log reading process versus the native acceleration provided by Velox. Specifically, we are interested in: 1. How **non-partitioned** Delta tables perform compared to standard Parquet in a Gluten environment. 2. If anyone has observed specific bottlenecks in metadata handling or scan performance with this configuration. 3. Recommended Spark/Gluten configurations to optimize the Delta-Velox scan path for large-scale non-partitioned data. If anyone has run these benchmarks or has a performance comparison (Native Spark vs. Gluten+Velox) for this setup, I would greatly appreciate it if you could share your findings or any tuning tips! Thanks! GitHub link: https://github.com/apache/incubator-gluten/discussions/11463 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
