GitHub user GlutenPerfBot created a discussion: January 16, 2026: Weekly Status 
Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten community has been intensely focused on Spark version compatibility 
and infrastructure improvements. This week saw significant progress on Spark 
4.1 support, daily Velox version updates, and cleanup of deprecated Spark 3.2 
code. The project merged 287k+ lines of Spark 4.1 unit tests and continued 
aggressive deprecation of Spark 3.2 support.

## Key Ongoing Projects
- **Spark 4.1 Support Initiative** - The team delivered comprehensive Spark 4.1 
compatibility:
  - #11353 by @baibaichen added 287k+ lines of Spark 4.1 unit tests with 
extensive test exclusions for later fixes
  - #11347 by @baibaichen introduced the Spark 4.1 shim layer with 62 files of 
compatibility changes
  - #11380 by @baibaichen continues expanding Spark 4.1 test coverage 
(currently in draft)

- **Spark 3.2 Deprecation** - Major cleanup effort led by @QCLyu:
  - #11351 removes Spark 3.2 support comprehensively from source code, build 
profiles, CI pipelines, and documentation
  - #11379 tracks remaining Spark 3.2 compatibility code that needs removal

- **Infrastructure & Performance**:
  - Daily Velox version updates continue (#11385, #11378, #11375, #11366, 
#11356) by @GlutenPerfBot
  - #11373 by @infvg adds HDFS integration tests to improve storage backend 
coverage
  - #11261 by @WangGuangxin introduces experimental Bolt backend (269k+ lines, 
1055 files)

## Priority Items
**Critical Bug Fixes Needed**:
- #11421 by @FelixYBW - OOM due to Hash based shuffle during Delta Lake table 
creation
- #11432 by @zhouyuan - Docker image build failed due to Maven download timeout
- #11427 by @luomh1998 - ClickHouse build fails with Clang 17 compatibility 
issues
- #11394 by @xumanbu - Operator count inflation in qualification tool reports

**Performance Issues**:
- #11403 by @Surbhi-Vijay - Exception in evaluating deprecated dataset sum 
operations
- #11402 by @Surbhi-Vijay - Incorrect decimal casting from floating point values

## Notable Discussions
- #8429: Gluten Slack Channel setup on ASF workspace for community communication
- #11388: Weekly status updates showing systematic tracking of project progress
- #8226: 2025 Roadmap discussion covering Spark 4.0 support, upstream Velox 
adoption, and TLP submission

## Emerging Trends
- **Spark Version Acceleration** - Rapid support for Spark 4.1 following 4.0 
pattern, suggesting annual Spark version support cycle
- **Backend Diversification** - Bolt backend introduction alongside 
Velox/ClickHouse indicates multi-backend strategy
- **Memory Management Focus** - Multiple memory-related issues and 
optimizations suggest this is a critical adoption blocker
- **Test Infrastructure Maturation** - Systematic daily Velox updates and 
comprehensive test suite additions show production readiness focus

## Good First Issues
- #10134: Add ANSI mode support - Well-documented issue with clear task 
breakdown for implementing Spark ANSI compliance across type casting, 
arithmetic functions, and date/time operations
- #11417: Enhance qualification tool for lakehouse formats - Update 
qualification tool to properly report scan benefits for 
Iceberg/Delta/Hudi/Paimon formats
- #11400: Track Spark 4.1.x failed unit tests - Systematic tracking and fixing 
of failing tests with clear categorization of issues

GitHub link: https://github.com/apache/incubator-gluten/discussions/11434

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to