GitHub user GlutenPerfBot created a discussion: January 09, 2026: Weekly Status 
Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten community has been intensely focused on Spark version compatibility 
and infrastructure improvements. This week saw significant progress on Spark 
4.1 support, daily Velox version updates, and cleanup of deprecated Spark 3.2 
code. The project merged 287k+ lines of Spark 4.1 unit tests and continued 
aggressive deprecation of Spark 3.2 support.

## Key Ongoing Projects

**Spark 4.1 Support Initiative** - The team delivered comprehensive Spark 4.1 
compatibility:
- #11353 by @baibaichen added 287k+ lines of Spark 4.1 unit tests with 
extensive test exclusions for later fixes
- #11347 by @baibaichen introduced the Spark 4.1 shim layer with 62 files of 
compatibility changes
- #11380 by @baibaichen continues expanding Spark 4.1 test coverage (currently 
in draft)

**Spark 3.2 Deprecation** - Major cleanup effort led by @QCLyu:
- #11351 removes Spark 3.2 support comprehensively from source code, build 
profiles, CI pipelines, and documentation
- #11379 tracks remaining Spark 3.2 compatibility code that needs removal

**Infrastructure & Performance**:
- Daily Velox version updates continue (#11385, #11378, #11375, #11366, #11356) 
by @GlutenPerfBot
- #11373 by @infvg adds HDFS integration tests to improve storage backend 
coverage
- #11261 by @WangGuangxin introduces experimental Bolt backend (269k+ lines, 
1055 files)

## Priority Items

**Critical Bug Fixes Needed**:
- #11372 by @Surbhi-Vijay - Spark 4.0 LeftSingle join support for correlated 
scalar subqueries (has open PR #11387)
- #11336 - Memory leak in ColumnarPartialProject causing 1.18GB memory pool 
leaks
- #11368/#11369 - Multiple AdaptiveQueryExecSuite test failures in both Velox 
and ClickHouse backends

**Performance Issues**:
- #11335 - HashJoin operation hanging on TPC-DS q72 (120s vs 40s vanilla Spark)
- #7269 - Umbrella issue tracking adoption performance problems including task 
kills, OOM, and spill performance

## Notable Discussions

**Development Process** - #11352 by @baibaichen proposes keeping commit history 
when adding new Spark versions using manual rebase + merge commit approach, 
similar to Apache Iceberg's process.

**Platform Support** - Community inquiries about KylinOS support (#11333) and 
Parquet Modular Encryption (#11338) show growing enterprise adoption interest.

## Emerging Trends

1. **Spark Version Acceleration** - Rapid support for Spark 4.1 following 4.0 
pattern, suggesting annual Spark version support cycle
2. **Backend Diversification** - Bolt backend introduction alongside 
Velox/ClickHouse indicates multi-backend strategy
3. **Memory Management Focus** - Multiple memory-related issues and 
optimizations suggest this is a critical adoption blocker
4. **Test Infrastructure Maturation** - Systematic daily Velox updates and 
comprehensive test suite additions show production readiness focus

## Good First Issues

**#11383** - Add Velox hash join bloom filter configurations. This involves 
adding two new configuration parameters to expose Velox's recent bloom filter 
functionality. Good for understanding Gluten's configuration system and Velox 
integration.

**#11316** - JDK-21 support for Spark-4.x. Spark 4.0 already supports JDK-21, 
and Gluten needs to follow. This is a foundational infrastructure task that 
touches build systems and compatibility layers.

**#8960** - Deprecate Spark-3.2 unit tests. As Spark 3.2 support is being 
removed, this involves cleaning up test infrastructure and documentation. 
Perfect for understanding Gluten's testing framework and CI/CD processes.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11388

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to