GitHub user GlutenPerfBot created a discussion: January 16, 2026: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The Gluten community has been intensely focused on Spark version compatibility and infrastructure improvements. This week saw significant progress on Spark 4.1 support, daily Velox version updates, and cleanup of deprecated Spark 3.2 code. The project merged 287k+ lines of Spark 4.1 unit tests and continued aggressive deprecation of Spark 3.2 support. ## Key Ongoing Projects - **Spark 4.1 Support Initiative** - The team delivered comprehensive Spark 4.1 compatibility: - #11353 by @baibaichen added 287k+ lines of Spark 4.1 unit tests with extensive test exclusions for later fixes - #11347 by @baibaichen introduced the Spark 4.1 shim layer with 62 files of compatibility changes - #11380 by @baibaichen continues expanding Spark 4.1 test coverage (currently in draft) - **Spark 3.2 Deprecation** - Major cleanup effort led by @QCLyu: - #11351 removes Spark 3.2 support comprehensively from source code, build profiles, CI pipelines, and documentation - #11379 tracks remaining Spark 3.2 compatibility code that needs removal - **Infrastructure & Performance**: - Daily Velox version updates continue (#11385, #11378, #11375, #11366, #11356) by @GlutenPerfBot - #11373 by @infvg adds HDFS integration tests to improve storage backend coverage - #11261 by @WangGuangxin introduces experimental Bolt backend (269k+ lines, 1055 files) ## Priority Items **Critical Bug Fixes Needed**: - #11421 by @FelixYBW - OOM due to Hash based shuffle during Delta Lake table creation - #11432 by @zhouyuan - Docker image build failed due to Maven download timeout - #11427 by @luomh1998 - ClickHouse build fails with Clang 17 compatibility issues - #11394 by @xumanbu - Operator count inflation in qualification tool reports **Performance Issues**: - #11403 by @Surbhi-Vijay - Exception in evaluating deprecated dataset sum operations - #11402 by @Surbhi-Vijay - Incorrect decimal casting from floating point values ## Notable Discussions - #8429: Gluten Slack Channel setup on ASF workspace for community communication - #11388: Weekly status updates showing systematic tracking of project progress - #8226: 2025 Roadmap discussion covering Spark 4.0 support, upstream Velox adoption, and TLP submission ## Emerging Trends - **Spark Version Acceleration** - Rapid support for Spark 4.1 following 4.0 pattern, suggesting annual Spark version support cycle - **Backend Diversification** - Bolt backend introduction alongside Velox/ClickHouse indicates multi-backend strategy - **Memory Management Focus** - Multiple memory-related issues and optimizations suggest this is a critical adoption blocker - **Test Infrastructure Maturation** - Systematic daily Velox updates and comprehensive test suite additions show production readiness focus ## Good First Issues - #10134: Add ANSI mode support - Well-documented issue with clear task breakdown for implementing Spark ANSI compliance across type casting, arithmetic functions, and date/time operations - #11417: Enhance qualification tool for lakehouse formats - Update qualification tool to properly report scan benefits for Iceberg/Delta/Hudi/Paimon formats - #11400: Track Spark 4.1.x failed unit tests - Systematic tracking and fixing of failing tests with clear categorization of issues GitHub link: https://github.com/apache/incubator-gluten/discussions/11434 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
