GitHub user GlutenPerfBot created a discussion: December 05, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days have been exceptionally busy with 57 merged PRs and 21 open ones, showing strong momentum toward the next release. Spark 4.0 compatibility work dominates the queue, while infrastructure improvements (CI disk-space, vcpkg cleanup, Folly bump) keep the build healthy. A new “Bolt” backend prototype (#11261) appeared, and Velox continues to advance with daily version bumps and memory-management fixes. ## Key Ongoing Projects - **Spark 4.0 compatibility push** – led by @zhouyuan, @marin-ma, @jinchengchenghh, @zml1206 and many others; ~40 test suites left to green-light (#11088) - **Delta Lake write enhancements** – @zhztheplayer added overwrite-mode support (#11226) and fixed whole-stage ID generation (#11252) - **Memory & stability** – @rui-mo landed Velox memory-manager lifecycle fix (#11249); @zhztheplayer disabled Parquet metadata validation by default to avoid regression (#11233) - **Flink-on-Gluten expansion** – @KevinyhZou wired Kafka connector (#9554) and fixed JSON parse-error handling (#10799) - **New backend experiment** – @WangGuangxin opened a 1 000-file draft for “Bolt” backend (#11261) ## Priority Items 1. **#11088** – Spark 4.0 test-suite tracker; needs volunteers for remaining suites (CSV, Parquet, Delta merge, streaming, etc.) 2. **#11261** – Bolt backend PR is huge (264 k LOC); early design review requested 3. **#11236** – GC-before-OOM patch by @zhztheplayer to curb off-heap broadcast leaks; under active review 4. **#11255** – “DNM TEST BHJ” draft by @zhouyuan; appears to hold critical broadcast-hash-join experiment 5. **#11254** – Community discussion on Delta fallback; needs feedback from @zhztheplayer or @jinchengchenghh ## Notable Discussions - **#11254** – Users seeing fallback to Spark when reading Delta 3.3.2; root cause is “UnknownFormat” for JSON manifest files and missing UDF registration - **#8429** – Slack channel #incubator-gluten invitation link refreshed; 15 comments show steady interest ## Emerging Trends - **Spark 4.0 readiness** is now the gating item for the next release; almost every merged PR carries the #11088 tag - **CI hygiene** is a parallel theme: disk-space fixes, Celeborn clean images, vcpkg pruning, clang-tidy introduction - **Small-file handling** optimizations (#11051, #11232) indicate growing production workloads on object stores - **Function-parity** work accelerates (div, map_from_arrays, split_part, etc.) as users migrate complex SQL ## Good First Issues - **#6814** – Add ClickHouse expression `MakeYMInterval`; pure CH-side C++ implementation - **#4730** – Implement `date_from_unix_date` for CH backend; good introduction to CH function registry - **#6807** – Wire CH `split_part` string function; includes unit-test template - **#6812** – Expose `SparkPartitionID()` in CH; requires reading Spark partition info - **#6815** – Add `MapZipWith` higher-order function to CH; great for learning CH’s lambda infrastructure All CH function tasks above are self-contained, come with existing examples, and need only basic C++ & function-registration knowledge—perfect for first-time contributors! GitHub link: https://github.com/apache/incubator-gluten/discussions/11262 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
