GitHub user GlutenPerfBot created a discussion: December 12, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The Gluten project has been highly active over the past 7 days with 55 merged PRs and 21 open PRs. The main focus areas include Spark 4.0 compatibility improvements, Velox backend optimizations, and infrastructure enhancements. Daily Velox version updates continue to be a regular occurrence, indicating active upstream integration. ## Key Ongoing Projects - **Spark 4.0 Compatibility**: Major effort led by @zhouyuan and @baibaichen to address failing unit tests (#11088). Multiple PRs have been merged including SQLQueryTestSuite support (#11136) and Parquet IO compatibility fixes (#11281). - **Velox Backend Optimization**: Daily version updates by @GlutenPerfBot continue, with recent improvements in memory management (#11249) and aggregation node handling (#11264). - **Code Quality Improvements**: Extensive cleanup efforts by @xinghuayu007 including IWYU tool integration (#11287) and parameter removal (#11285). - **New Backend Development**: Work in progress on Bolt backend (#11261) by @WangGuangxin, representing a Velox fork from ByteDance. ## Priority Items - **Spark 4.0 Test Failures**: #11088 needs continued attention with remaining test suites like ArrowEvalPythonExecSuite and JsonFunctionsValidateSuite still failing. - **Memory Management**: #11249 by @rui-mo addresses critical Velox memory manager lifecycle issues that could impact production stability. - **Compression Optimization**: #11284 by @wecharyu fixes ZSTD compression level mismatch between Spark and Velox, potentially significant for storage costs. - **Spill Level Issues**: #10845 reports HashBuild operator spill level exceeding maximum, affecting query execution reliability. ## Notable Discussions - **EMR Deployment Guide**: #11279 highlights community need for better AWS EMR deployment documentation, with @ammarchalifah seeking guidance on production deployment. - **Library Linking Strategy**: #11282 discusses critical production issue with static vs dynamic linking of libstdc++ and libgcc, potentially affecting stability. - **Delta Lake Support**: #11254 addresses fallback issues when reading Delta Lake tables, indicating ongoing compatibility challenges. ## Emerging Trends - **Multi-Backend Strategy**: Growing interest in alternative backends with both Omni (#10188) and Bolt (#11261) proposals, suggesting community desire for more options. - **Production Deployment Focus**: Increased discussions around deployment guides, library conflicts, and production stability indicate maturing adoption. - **Performance Optimization**: Multiple PRs focused on memory management, compression, and query plan optimization suggest production performance tuning is a priority. ## Good First Issues - **#6814**: Support MakeYMInterval expression for ClickHouse backend - Good entry point for learning expression implementation - **#4730**: Implement date_from_unix_date function for ClickHouse - Simple function addition with clear requirements - **#6807**: Add split_part function support for ClickHouse - String manipulation function, well-defined scope - **#6812**: Implement SparkPartitionID function for ClickHouse - Useful for partitioning scenarios, straightforward implementation - **#6815**: Support MapZipWith expression for ClickHouse - More advanced but well-documented function with clear specifications These issues are excellent starting points as they involve implementing specific functions/expressions with clear requirements, allowing new contributors to understand the codebase while making meaningful contributions. Most require basic C++ knowledge and understanding of ClickHouse backend architecture. GitHub link: https://github.com/apache/incubator-gluten/discussions/11290 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
