GitHub user GlutenPerfBot created a discussion: November 14, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days have been exceptionally productive for the Gluten project, with 
**45 pull requests** and **15 issues** actively worked on. The community is 
heavily focused on **performance optimizations**, **Spark 4.0 compatibility**, 
and **infrastructure improvements**. Notable achievements include significant 
GPU batch processing enhancements and the successful integration of Spark 4.0 
unit test frameworks.

## Key Ongoing Projects

### 🚀 GPU Acceleration Initiative
- **#11090** by @jinchengchenghh introduces groundbreaking GPU batch resizing 
capabilities, achieving **2x performance improvement** in TPC-DS Q95 (27s → 13s)
- **#11069** fixes VeloxResizeBatch integration with ReusedExchange, ensuring 
consistent GPU batch processing across all shuffle reads

### 🔧 Spark 4.0 Compatibility Drive
- **#10725** by @zhouyuan successfully merged the comprehensive Spark 4.0 unit 
test framework, marking a major milestone
- **#11088** tracks remaining Spark 4.0 test failures, with active community 
engagement to resolve compatibility issues

### 💪 Performance Optimization Efforts
- **#8931** by @JkSelf delivers **1.29x performance improvement** in TPC-DS 
Q23a through optimized Broadcast Hash Join implementation
- **#11051** by @marin-ma introduces intelligent small file partitioning, 
significantly reducing partition skew

## Priority Items

### 🚨 Critical Bug Fixes Needed
- **#11091** by @zhouyuan - Weekly build failures across CentOS/Ubuntu 
platforms need immediate attention
- **#11082** by @wForget - Cast string with numeric suffixes returning 
incorrect NULL values
- **#11080** by @wForget - from_unixtime function producing incorrect timestamp 
formatting

### 🔥 High-Impact Performance Issues
- **#11070** by @yjshen - **16x higher memory spilling** in Gluten vs vanilla 
Spark causing OOM kills
- **#11057** by @beliefer - Aggregation result mismatches between Spark 3.2.2 
and Gluten 1.3

## Notable Discussions

### 🌟 New Backend Proposal
- **#10929** by @WangGuangxin introduces **Bolt**, a Velox fork from ByteDance 
promising enhanced stability and LLVM-based JIT compilation optimizations

### 🔧 Infrastructure Challenges
- **#11054** discusses SSD cache + O_DIRECT alignment issues, seeking community 
input on 4KB blocksize compatibility
- **#10686** clarifies the relationship between experimental and internal 
configuration flags

## Emerging Trends

1. **GPU-First Architecture**: Multiple PRs indicate a strategic shift toward 
GPU-accelerated processing
2. **Memory Management Crisis**: Several issues highlight excessive memory 
usage requiring urgent attention
3. **Production Readiness Focus**: Increased emphasis on stability fixes and 
comprehensive testing
4. **Multi-Backend Expansion**: Growing interest in alternative backends beyond 
Velox

## Good First Issues

Perfect for new contributors looking to make meaningful impact:

- **#6814**: Implement MakeYMInterval expression for ClickHouse backend - Great 
introduction to expression handling
- **#4730**: Add date_from_unix_date function support - Straightforward 
function implementation
- **#6807**: Implement split_part function - Common string operation, 
well-documented requirements
- **#6812**: Add SparkPartitionID function support - Essential for partitioning 
operations
- **#6815**: Implement MapZipWith expression - Advanced but well-scoped 
functional programming feature

**Skills needed**: Basic C++ knowledge, understanding of database functions, 
willingness to learn Gluten's architecture. These issues offer excellent 
mentorship opportunities and quick wins for building confidence in the codebase.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11093

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to