GitHub user GlutenPerfBot created a discussion: December 05, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days have been exceptionally busy with 57 merged PRs and 21 open 
ones, showing strong momentum toward the next release. Spark 4.0 compatibility 
work dominates the queue, while infrastructure improvements (CI disk-space, 
vcpkg cleanup, Folly bump) keep the build healthy. A new “Bolt” backend 
prototype (#11261) appeared, and Velox continues to advance with daily version 
bumps and memory-management fixes.

## Key Ongoing Projects
- **Spark 4.0 compatibility push** – led by @zhouyuan, @marin-ma, 
@jinchengchenghh, @zml1206 and many others; ~40 test suites left to green-light 
(#11088)
- **Delta Lake write enhancements** – @zhztheplayer added overwrite-mode 
support (#11226) and fixed whole-stage ID generation (#11252)
- **Memory & stability** – @rui-mo landed Velox memory-manager lifecycle fix 
(#11249); @zhztheplayer disabled Parquet metadata validation by default to 
avoid regression (#11233)
- **Flink-on-Gluten expansion** – @KevinyhZou wired Kafka connector (#9554) and 
fixed JSON parse-error handling (#10799)
- **New backend experiment** – @WangGuangxin opened a 1 000-file draft for 
“Bolt” backend (#11261)

## Priority Items
1. **#11088** – Spark 4.0 test-suite tracker; needs volunteers for remaining 
suites (CSV, Parquet, Delta merge, streaming, etc.)
2. **#11261** – Bolt backend PR is huge (264 k LOC); early design review 
requested
3. **#11236** – GC-before-OOM patch by @zhztheplayer to curb off-heap broadcast 
leaks; under active review
4. **#11255** – “DNM TEST BHJ” draft by @zhouyuan; appears to hold critical 
broadcast-hash-join experiment
5. **#11254** – Community discussion on Delta fallback; needs feedback from 
@zhztheplayer or @jinchengchenghh

## Notable Discussions
- **#11254** – Users seeing fallback to Spark when reading Delta 3.3.2; root 
cause is “UnknownFormat” for JSON manifest files and missing UDF registration
- **#8429** – Slack channel #incubator-gluten invitation link refreshed; 15 
comments show steady interest

## Emerging Trends
- **Spark 4.0 readiness** is now the gating item for the next release; almost 
every merged PR carries the #11088 tag
- **CI hygiene** is a parallel theme: disk-space fixes, Celeborn clean images, 
vcpkg pruning, clang-tidy introduction
- **Small-file handling** optimizations (#11051, #11232) indicate growing 
production workloads on object stores
- **Function-parity** work accelerates (div, map_from_arrays, split_part, etc.) 
as users migrate complex SQL

## Good First Issues
- **#6814** – Add ClickHouse expression `MakeYMInterval`; pure CH-side C++ 
implementation
- **#4730** – Implement `date_from_unix_date` for CH backend; good introduction 
to CH function registry
- **#6807** – Wire CH `split_part` string function; includes unit-test template
- **#6812** – Expose `SparkPartitionID()` in CH; requires reading Spark 
partition info
- **#6815** – Add `MapZipWith` higher-order function to CH; great for learning 
CH’s lambda infrastructure

All CH function tasks above are self-contained, come with existing examples, 
and need only basic C++ & function-registration knowledge—perfect for 
first-time contributors!

GitHub link: https://github.com/apache/incubator-gluten/discussions/11262

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to