GitHub user GlutenPerfBot created a discussion: January 09, 2026: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The Gluten community has been intensely focused on Spark version compatibility and infrastructure improvements. This week saw significant progress on Spark 4.1 support, daily Velox version updates, and cleanup of deprecated Spark 3.2 code. The project merged 287k+ lines of Spark 4.1 unit tests and continued aggressive deprecation of Spark 3.2 support. ## Key Ongoing Projects **Spark 4.1 Support Initiative** - The team delivered comprehensive Spark 4.1 compatibility: - #11353 by @baibaichen added 287k+ lines of Spark 4.1 unit tests with extensive test exclusions for later fixes - #11347 by @baibaichen introduced the Spark 4.1 shim layer with 62 files of compatibility changes - #11380 by @baibaichen continues expanding Spark 4.1 test coverage (currently in draft) **Spark 3.2 Deprecation** - Major cleanup effort led by @QCLyu: - #11351 removes Spark 3.2 support comprehensively from source code, build profiles, CI pipelines, and documentation - #11379 tracks remaining Spark 3.2 compatibility code that needs removal **Infrastructure & Performance**: - Daily Velox version updates continue (#11385, #11378, #11375, #11366, #11356) by @GlutenPerfBot - #11373 by @infvg adds HDFS integration tests to improve storage backend coverage - #11261 by @WangGuangxin introduces experimental Bolt backend (269k+ lines, 1055 files) ## Priority Items **Critical Bug Fixes Needed**: - #11372 by @Surbhi-Vijay - Spark 4.0 LeftSingle join support for correlated scalar subqueries (has open PR #11387) - #11336 - Memory leak in ColumnarPartialProject causing 1.18GB memory pool leaks - #11368/#11369 - Multiple AdaptiveQueryExecSuite test failures in both Velox and ClickHouse backends **Performance Issues**: - #11335 - HashJoin operation hanging on TPC-DS q72 (120s vs 40s vanilla Spark) - #7269 - Umbrella issue tracking adoption performance problems including task kills, OOM, and spill performance ## Notable Discussions **Development Process** - #11352 by @baibaichen proposes keeping commit history when adding new Spark versions using manual rebase + merge commit approach, similar to Apache Iceberg's process. **Platform Support** - Community inquiries about KylinOS support (#11333) and Parquet Modular Encryption (#11338) show growing enterprise adoption interest. ## Emerging Trends 1. **Spark Version Acceleration** - Rapid support for Spark 4.1 following 4.0 pattern, suggesting annual Spark version support cycle 2. **Backend Diversification** - Bolt backend introduction alongside Velox/ClickHouse indicates multi-backend strategy 3. **Memory Management Focus** - Multiple memory-related issues and optimizations suggest this is a critical adoption blocker 4. **Test Infrastructure Maturation** - Systematic daily Velox updates and comprehensive test suite additions show production readiness focus ## Good First Issues **#11383** - Add Velox hash join bloom filter configurations. This involves adding two new configuration parameters to expose Velox's recent bloom filter functionality. Good for understanding Gluten's configuration system and Velox integration. **#11316** - JDK-21 support for Spark-4.x. Spark 4.0 already supports JDK-21, and Gluten needs to follow. This is a foundational infrastructure task that touches build systems and compatibility layers. **#8960** - Deprecate Spark-3.2 unit tests. As Spark 3.2 support is being removed, this involves cleaning up test infrastructure and documentation. Perfect for understanding Gluten's testing framework and CI/CD processes. GitHub link: https://github.com/apache/incubator-gluten/discussions/11388 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
