GitHub user GlutenPerfBot created a discussion: September 08, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days saw 38 merged PRs and 29 open PRs across Velox, ClickHouse, Flink and build/infra areas. Velox backend dominated with daily version bumps, shuffle-read optimizations, and new function enablements. Flink activity surged (7 PRs) around Nexmark benchmark support, while Iceberg/Delta lake features and GPU/cuDF connectors are gaining momentum. Community is preparing for Gluten 1.5.0 release with documentation clean-ups and CI improvements. ## Key Ongoing Projects - **Shuffle-read performance overhaul** – @marin-ma merged #10499 to coalesce small batches and eliminate `VeloxResizeBatches`, cutting deserialize overhead in sort-based shuffle. - **Iceberg write & partition support** – @jinchengchenghh landed #10497 for partition write; #10285 adds Iceberg functions (still fallback). - **Delta Lake read/write PoC** – @zhztheplayer opened #10216 for Delta 3.3.1 write and #10639 for DV-enabled TPC-DS table generation. - **Flink Nexmark completeness** – @shuai-xu & @KevinyhZou enabled q11-q21, processing-time windows, and Java-17 compatibility (#10548, #10631, #10572). - **cuDF/GPU connector** – @jinchengchenghh introduced #10622 for cuDF parquet reader and #10621 for GPU connector tracking. - **Spark 4.0 CI readiness** – @zhouyuan enabled TPC-DS tests on Spark-400 (#10633) and moved Spark-3.2 tests to nightly (#8961). ## Priority Items - **Release blockers for 1.5.0** (#10574) – @PHILO-HE tracking open ports: #10603 (config doc tidy-up by @zjuwangg) and #10641 (arrow url typo by @liujiayi771). - **Critical bug fixes** – #10644 (NumberFormatException in window bounds) by @mingyi6666; #10635 (file INSTALL permission) by @beliefer; #10511 (Delta column-mapping wrong results) by @sezruby. - **Memory OOM & stability** – #7249 global-memory OOM during spill needs Velox-side fix; #9846 deadlock in TableScan preload; #9845 ARM core-dump under investigation. ## Notable Discussions - #10406: @ryyyyyy1 asks about SARG push-down in Flink and early Velox4j initialization—community feedback wanted. - #10214: long thread on high shuffle deserialize time; #10499 already merged but further tuning invited. - #8018: stage-level ResourceProfile auto-adjust design accepted; POC code to be contributed by @zjuwangg. ## Emerging Trends 1. **Lake-house acceleration** – Iceberg & Delta PRs appearing daily; deletion-vector and column-mapping fixes show production readiness push. 2. **Flink-first development** – Nexmark benchmark now drives Flink function parity (UDFs, decimal, time attributes). 3. **GPU/cuDF integration** – Velox cuDF parquet connector merged; GPU memory config and connector stubs checked in. 4. **Micro-performance focus** – repeated `identifyBatchType` (#10649), `StrictRule` simplification (#10553), and hash-table build configs (#10634) all target driver-side CPU. 5. **Documentation & CI hygiene** – daily Velox bumps automated, Spark-3.2 demoted to nightly, Maven enforcer rules relaxed (#10536). ## Good First Issues - #6814: add ClickHouse expression `MakeYMInterval` – pure CH backend, no native changes. - #4730: implement `date_from_unix_date` for ClickHouse – follow existing date function pattern. - #6807: support `split_part` string function in ClickHouse – straightforward string splitting logic. - #6812: expose `SparkPartitionID` in ClickHouse backend – reuse Spark’s partition ID. - #6815: implement `MapZipWith` for ClickHouse – entry-level map function, good for learning CH UDF framework. All CH good-first issues need basic C++ and ClickHouse function registration knowledge; unit tests and documentation updates are expected. GitHub link: https://github.com/apache/incubator-gluten/discussions/10652 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
