GitHub user GlutenPerfBot created a discussion: December 12, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten project has been highly active over the past 7 days with 55 merged 
PRs and 21 open PRs. The main focus areas include Spark 4.0 compatibility 
improvements, Velox backend optimizations, and infrastructure enhancements. 
Daily Velox version updates continue to be a regular occurrence, indicating 
active upstream integration.

## Key Ongoing Projects
- **Spark 4.0 Compatibility**: Major effort led by @zhouyuan and @baibaichen to 
address failing unit tests (#11088). Multiple PRs have been merged including 
SQLQueryTestSuite support (#11136) and Parquet IO compatibility fixes (#11281).
- **Velox Backend Optimization**: Daily version updates by @GlutenPerfBot 
continue, with recent improvements in memory management (#11249) and 
aggregation node handling (#11264).
- **Code Quality Improvements**: Extensive cleanup efforts by @xinghuayu007 
including IWYU tool integration (#11287) and parameter removal (#11285).
- **New Backend Development**: Work in progress on Bolt backend (#11261) by 
@WangGuangxin, representing a Velox fork from ByteDance.

## Priority Items
- **Spark 4.0 Test Failures**: #11088 needs continued attention with remaining 
test suites like ArrowEvalPythonExecSuite and JsonFunctionsValidateSuite still 
failing.
- **Memory Management**: #11249 by @rui-mo addresses critical Velox memory 
manager lifecycle issues that could impact production stability.
- **Compression Optimization**: #11284 by @wecharyu fixes ZSTD compression 
level mismatch between Spark and Velox, potentially significant for storage 
costs.
- **Spill Level Issues**: #10845 reports HashBuild operator spill level 
exceeding maximum, affecting query execution reliability.

## Notable Discussions
- **EMR Deployment Guide**: #11279 highlights community need for better AWS EMR 
deployment documentation, with @ammarchalifah seeking guidance on production 
deployment.
- **Library Linking Strategy**: #11282 discusses critical production issue with 
static vs dynamic linking of libstdc++ and libgcc, potentially affecting 
stability.
- **Delta Lake Support**: #11254 addresses fallback issues when reading Delta 
Lake tables, indicating ongoing compatibility challenges.

## Emerging Trends
- **Multi-Backend Strategy**: Growing interest in alternative backends with 
both Omni (#10188) and Bolt (#11261) proposals, suggesting community desire for 
more options.
- **Production Deployment Focus**: Increased discussions around deployment 
guides, library conflicts, and production stability indicate maturing adoption.
- **Performance Optimization**: Multiple PRs focused on memory management, 
compression, and query plan optimization suggest production performance tuning 
is a priority.

## Good First Issues
- **#6814**: Support MakeYMInterval expression for ClickHouse backend - Good 
entry point for learning expression implementation
- **#4730**: Implement date_from_unix_date function for ClickHouse - Simple 
function addition with clear requirements
- **#6807**: Add split_part function support for ClickHouse - String 
manipulation function, well-defined scope
- **#6812**: Implement SparkPartitionID function for ClickHouse - Useful for 
partitioning scenarios, straightforward implementation
- **#6815**: Support MapZipWith expression for ClickHouse - More advanced but 
well-documented function with clear specifications

These issues are excellent starting points as they involve implementing 
specific functions/expressions with clear requirements, allowing new 
contributors to understand the codebase while making meaningful contributions. 
Most require basic C++ knowledge and understanding of ClickHouse backend 
architecture.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11290

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to