GitHub user GlutenPerfBot created a discussion: November 14, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days have been exceptionally productive for the Gluten project, with **45 pull requests** and **15 issues** actively worked on. The community is heavily focused on **performance optimizations**, **Spark 4.0 compatibility**, and **infrastructure improvements**. Notable achievements include significant GPU batch processing enhancements and the successful integration of Spark 4.0 unit test frameworks. ## Key Ongoing Projects ### 🚀 GPU Acceleration Initiative - **#11090** by @jinchengchenghh introduces groundbreaking GPU batch resizing capabilities, achieving **2x performance improvement** in TPC-DS Q95 (27s → 13s) - **#11069** fixes VeloxResizeBatch integration with ReusedExchange, ensuring consistent GPU batch processing across all shuffle reads ### 🔧 Spark 4.0 Compatibility Drive - **#10725** by @zhouyuan successfully merged the comprehensive Spark 4.0 unit test framework, marking a major milestone - **#11088** tracks remaining Spark 4.0 test failures, with active community engagement to resolve compatibility issues ### 💪 Performance Optimization Efforts - **#8931** by @JkSelf delivers **1.29x performance improvement** in TPC-DS Q23a through optimized Broadcast Hash Join implementation - **#11051** by @marin-ma introduces intelligent small file partitioning, significantly reducing partition skew ## Priority Items ### 🚨 Critical Bug Fixes Needed - **#11091** by @zhouyuan - Weekly build failures across CentOS/Ubuntu platforms need immediate attention - **#11082** by @wForget - Cast string with numeric suffixes returning incorrect NULL values - **#11080** by @wForget - from_unixtime function producing incorrect timestamp formatting ### 🔥 High-Impact Performance Issues - **#11070** by @yjshen - **16x higher memory spilling** in Gluten vs vanilla Spark causing OOM kills - **#11057** by @beliefer - Aggregation result mismatches between Spark 3.2.2 and Gluten 1.3 ## Notable Discussions ### 🌟 New Backend Proposal - **#10929** by @WangGuangxin introduces **Bolt**, a Velox fork from ByteDance promising enhanced stability and LLVM-based JIT compilation optimizations ### 🔧 Infrastructure Challenges - **#11054** discusses SSD cache + O_DIRECT alignment issues, seeking community input on 4KB blocksize compatibility - **#10686** clarifies the relationship between experimental and internal configuration flags ## Emerging Trends 1. **GPU-First Architecture**: Multiple PRs indicate a strategic shift toward GPU-accelerated processing 2. **Memory Management Crisis**: Several issues highlight excessive memory usage requiring urgent attention 3. **Production Readiness Focus**: Increased emphasis on stability fixes and comprehensive testing 4. **Multi-Backend Expansion**: Growing interest in alternative backends beyond Velox ## Good First Issues Perfect for new contributors looking to make meaningful impact: - **#6814**: Implement MakeYMInterval expression for ClickHouse backend - Great introduction to expression handling - **#4730**: Add date_from_unix_date function support - Straightforward function implementation - **#6807**: Implement split_part function - Common string operation, well-documented requirements - **#6812**: Add SparkPartitionID function support - Essential for partitioning operations - **#6815**: Implement MapZipWith expression - Advanced but well-scoped functional programming feature **Skills needed**: Basic C++ knowledge, understanding of database functions, willingness to learn Gluten's architecture. These issues offer excellent mentorship opportunities and quick wins for building confidence in the codebase. GitHub link: https://github.com/apache/incubator-gluten/discussions/11093 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
