GitHub user GlutenPerfBot created a discussion: November 21, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days have been exceptionally productive for the Gluten project, with 
45 pull requests and 15 issues actively worked on. The community is heavily 
focused on performance optimizations, Spark 4.0 compatibility, and 
infrastructure improvements. Notable achievements include significant GPU batch 
processing enhancements and the successful integration of Spark 4.0 unit test 
frameworks.

## Key Ongoing Projects
🚀 **GPU Acceleration Initiative**
- #11090 by @jinchengchenghh introduces groundbreaking GPU batch resizing 
capabilities, achieving 2x performance improvement in TPC-DS Q95 (27s → 13s)
- #11069 fixes VeloxResizeBatch integration with ReusedExchange, ensuring 
consistent GPU batch processing across all shuffle reads

🔧 **Spark 4.0 Compatibility Drive**
- #10725 by @zhouyuan successfully merged the comprehensive Spark 4.0 unit test 
framework, marking a major milestone
- #11088 tracks remaining Spark 4.0 test failures, with active community 
engagement to resolve compatibility issues

💪 **Performance Optimization Efforts**
- #8931 by @JkSelf delivers 1.29x performance improvement in TPC-DS Q23a 
through optimized Broadcast Hash Join implementation
- #11051 by @marin-ma introduces intelligent small file partitioning, 
significantly reducing partition skew

## Priority Items
🚨 **Critical Bug Fixes Needed**
- #11091 by @zhouyuan - Weekly build failures across CentOS/Ubuntu platforms 
need immediate attention
- #11082 by @wForget - Cast string with numeric suffixes returning incorrect 
NULL values
- #11080 by @wForget - from_unixtime function producing incorrect timestamp 
formatting

🔥 **High-Impact Performance Issues**
- #11070 by @yjshen - 16x higher memory spilling in Gluten vs vanilla Spark 
causing OOM kills
- #11057 by @beliefer - Aggregation result mismatches between Spark 3.2.2 and 
Gluten 1.3

## Notable Discussions
🌟 **New Backend Proposal**
- #10929 by @WangGuangxin introduces Bolt, a Velox fork from ByteDance 
promising enhanced stability and LLVM-based JIT compilation optimizations

🔧 **Infrastructure Challenges**
- #11054 discusses SSD cache + O_DIRECT alignment issues, seeking community 
input on 4KB blocksize compatibility
- #10686 clarifies the relationship between experimental and internal 
configuration flags

## Emerging Trends
- **GPU-First Architecture**: Multiple PRs indicate a strategic shift toward 
GPU-accelerated processing
- **Memory Management Crisis**: Several issues highlight excessive memory usage 
requiring urgent attention
- **Production Readiness Focus**: Increased emphasis on stability fixes and 
comprehensive testing
- **Multi-Backend Expansion**: Growing interest in alternative backends beyond 
Velox

## Good First Issues
Perfect for new contributors looking to make meaningful impact:

- #6814: Implement MakeYMInterval expression for ClickHouse backend - Great 
introduction to expression handling
- #4730: Add date_from_unix_date function support - Straightforward function 
implementation
- #6807: Implement split_part function - Common string operation, 
well-documented requirements
- #6812: Add SparkPartitionID function support - Essential for partitioning 
operations
- #6815: Implement MapZipWith expression - Advanced but well-scoped functional 
programming feature

**Skills needed**: Basic C++ knowledge, understanding of database functions, 
willingness to learn Gluten's architecture. These issues offer excellent 
mentorship opportunities and quick wins for building confidence in the codebase.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11151

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to