GitHub user GlutenPerfBot created a discussion: November 21, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days have been exceptionally productive for the Gluten project, with 45 pull requests and 15 issues actively worked on. The community is heavily focused on performance optimizations, Spark 4.0 compatibility, and infrastructure improvements. Notable achievements include significant GPU batch processing enhancements and the successful integration of Spark 4.0 unit test frameworks. ## Key Ongoing Projects 🚀 **GPU Acceleration Initiative** - #11090 by @jinchengchenghh introduces groundbreaking GPU batch resizing capabilities, achieving 2x performance improvement in TPC-DS Q95 (27s → 13s) - #11069 fixes VeloxResizeBatch integration with ReusedExchange, ensuring consistent GPU batch processing across all shuffle reads 🔧 **Spark 4.0 Compatibility Drive** - #10725 by @zhouyuan successfully merged the comprehensive Spark 4.0 unit test framework, marking a major milestone - #11088 tracks remaining Spark 4.0 test failures, with active community engagement to resolve compatibility issues 💪 **Performance Optimization Efforts** - #8931 by @JkSelf delivers 1.29x performance improvement in TPC-DS Q23a through optimized Broadcast Hash Join implementation - #11051 by @marin-ma introduces intelligent small file partitioning, significantly reducing partition skew ## Priority Items 🚨 **Critical Bug Fixes Needed** - #11091 by @zhouyuan - Weekly build failures across CentOS/Ubuntu platforms need immediate attention - #11082 by @wForget - Cast string with numeric suffixes returning incorrect NULL values - #11080 by @wForget - from_unixtime function producing incorrect timestamp formatting 🔥 **High-Impact Performance Issues** - #11070 by @yjshen - 16x higher memory spilling in Gluten vs vanilla Spark causing OOM kills - #11057 by @beliefer - Aggregation result mismatches between Spark 3.2.2 and Gluten 1.3 ## Notable Discussions 🌟 **New Backend Proposal** - #10929 by @WangGuangxin introduces Bolt, a Velox fork from ByteDance promising enhanced stability and LLVM-based JIT compilation optimizations 🔧 **Infrastructure Challenges** - #11054 discusses SSD cache + O_DIRECT alignment issues, seeking community input on 4KB blocksize compatibility - #10686 clarifies the relationship between experimental and internal configuration flags ## Emerging Trends - **GPU-First Architecture**: Multiple PRs indicate a strategic shift toward GPU-accelerated processing - **Memory Management Crisis**: Several issues highlight excessive memory usage requiring urgent attention - **Production Readiness Focus**: Increased emphasis on stability fixes and comprehensive testing - **Multi-Backend Expansion**: Growing interest in alternative backends beyond Velox ## Good First Issues Perfect for new contributors looking to make meaningful impact: - #6814: Implement MakeYMInterval expression for ClickHouse backend - Great introduction to expression handling - #4730: Add date_from_unix_date function support - Straightforward function implementation - #6807: Implement split_part function - Common string operation, well-documented requirements - #6812: Add SparkPartitionID function support - Essential for partitioning operations - #6815: Implement MapZipWith expression - Advanced but well-scoped functional programming feature **Skills needed**: Basic C++ knowledge, understanding of database functions, willingness to learn Gluten's architecture. These issues offer excellent mentorship opportunities and quick wins for building confidence in the codebase. GitHub link: https://github.com/apache/incubator-gluten/discussions/11151 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
