GitHub user GlutenPerfBot created a discussion: December 19, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten community merged 25 PRs and opened 9 new ones over the past 7 days. 
Major themes include Spark 4.0 compatibility fixes, daily Velox version bumps, 
infrastructure clean-ups, and new Flink connector work. Contributors are 
actively polishing the code base ahead of the next release.

## Key Ongoing Projects
- **Spark 4.0 compatibility** – @baibaichen, @zhztheplayer and @zhouyuan 
continue to burn down failing UTs (#11088); recent fixes cover Parquet IO, JSON 
functions, Arrow Python and Delta update commands
- **Daily Velox integration** – @GlutenPerfBot lands fresh Velox commits every 
day (#6887); this keeps Gluten in sync with upstream performance and function 
improvements
- **Flink ecosystem** – @KevinyhZou added Kafka source support (#9553, #11312) 
and filesystem sink (#10064, #11300), expanding Gluten beyond Spark
- **Code-quality & build** – @xinghuayu007 introduced IWYU (#11287) and 
clang-tidy (#11120) checks; @PHILO-HE cleaned legacy scripts (#11305) and 
directory names (#10219)
- **New backends** – @WangGuangxin posted an early "Bolt" backend PoC (#11261), 
a Velox fork from ByteDance

## Priority Items
- **Memory manager stability** – #11249 by @rui-mo (merged) fixes task-level 
race between Velox destructor and async I/O; critical for production
- **Z-standard compression** – #11284 by @wecharyu (merged) aligns Velox level 
with Spark default (3) to avoid bigger Parquet files
- **Spark 4.0 test failures** – #11088 still blocks release; suites 
ArrowEvalPython, JsonFunctions, CSV, Hive need owners
- **GPU build broken** – #11302 (closed) required gcc-14 + cuda-toolkit-13.1; 
monitor follow-up PR #11275 by @zhouyuan
- **Parquet metadata validation** – disabled by default (#11233, #11307) until 
performance regression is solved

## Notable Discussions
- #11290 (Weekly status) – community call for EMR deployment guide and help 
with static/dynamic linking of libstdc++
- #11279 – @ammarchalifah asks for AWS EMR best-practice documentation; good 
place to contribute docs
- #11282 – production alert on linking strategy; maintainers weighing static vs 
dynamic libgcc/libstdc++

## Emerging Trends
1. Multi-engine support: Flink Kafka/filesystem sinks show Gluten positioning 
as a universal native accelerator
2. Release readiness: flurry of minor clean-ups (TPP.txt removal, script typos, 
dead config flags) signals prep for stable branch
3. Backend diversification: Omni and Bolt experiments indicate demand for 
vendor-specific optimizations
4. Memory & stability focus: recent fixes for OOM-GC interaction, broadcast 
spill, and memory-manager teardown highlight production hardening

## Good First Issues
- #6814 – Implement MakeYMInterval expression for ClickHouse; straightforward 
function mapping
- #4730 – Add date_from_unix_date function for ClickHouse; already prototyped 
in #10026 by @soupam05—needs review
- #6807 – Support split_part function for ClickHouse; string manipulation with 
clear spec
- #6812 – Add SparkPartitionID function for ClickHouse; useful for 
partition-aware queries
- #6815 – Support MapZipWith expression for ClickHouse; slightly more advanced 
but well-documented

All issues above are self-contained, require basic C++ and ClickHouse 
knowledge, and come with existing examples in the codebase—perfect for 
first-time contributors to learn Gluten's function registration flow.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11315

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to