GitHub user GlutenPerfBot created a discussion: February 06, 2026: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten project has been highly active over the past week with 49 pull 
requests and 21 issues, showing strong momentum in development. Key themes 
include Spark 4.x compatibility improvements, build system enhancements, memory 
management optimizations, and ANSI mode support expansion. The community is 
actively working on stabilizing the upcoming 1.6.0 release while addressing 
critical memory and performance issues.

## Key Ongoing Projects

### Spark 4.x Compatibility Initiative
A major effort led by @baibaichen to ensure full Spark 4.0/4.1 compatibility, 
with 400+ test suites being enabled. Recent fixes include:
- #11577 by @baibaichen enabling GlutenSingleJoinSuite for Spark 4.0+
- #11580 by @baibaichen fixing XML expressions test suite
- #11579 by @Surbhi-Vijay fixing multiple test suites for Spark 4.0+

### Build System Modernization
@liuneng1994 introduced a complete Gradle build system (#11576) that coexists 
with Maven, offering 2.5x faster cold builds and 118x faster incremental 
builds. This represents a significant developer experience improvement.

### Memory Management Improvements
Several critical memory-related fixes:
- #11553 by @malinjawi making TreeMemoryConsumer thread-safe
- #11532 by @clee704 fixing protobuf memory leak in JNI_OnUnload
- #11485 race condition fix in VeloxMemoryManager by @baibaichen

### ANSI Mode Support Expansion
@PHILO-HE continues leading the comprehensive ANSI mode support initiative 
(#10134), with recent additions including string-to-boolean casting and ongoing 
work on type casting functions.

## Priority Items

### Critical Memory Issues
- #11542: RSS shuffle writer memory threshold problems causing OOM errors
- #11540: Window operations hitting memory limits
- #11541: RowToVeloxColumnar memory allocation issues

### Build and CI Infrastructure
- #11582: Java stackoverflow issues during Spark-4.0 packaging
- #11501: Docker image optimization for faster CI builds
- #11511: CentOS 9 support for Spark unit tests

### Performance Optimizations
- #8931: Broadcast hash join optimization by @JkSelf showing 1.29x performance 
improvement
- #11419: Delta write native statistics tracker eliminating C2R overhead by 
@zhztheplayer

## Notable Discussions

### Performance Benchmarking
#11554: Community discussion on Velox Bloom Filter inefficiency compared to 
Databricks Photon at 1TB scale, highlighting the need for better large-scale 
filtering capabilities.

### Release Planning
#11568: Upcoming release manager scheduling for 1.6.0 (February 2026) through 
1.10.0, with @zhztheplayer managing the upcoming 1.6.0 release.

### Platform Support
#11535: macOS Apple Silicon support discussion, indicating growing interest in 
local development on modern hardware.

## Emerging Trends

1. **Spark 4.x Migration Acceleration**: The project is rapidly moving toward 
full Spark 4.x compatibility with extensive test coverage being added.

2. **Memory Management Focus**: Significant engineering effort is being 
directed toward solving memory-related issues, particularly around shuffle 
operations and off-heap memory management.

3. **Build System Evolution**: The introduction of Gradle alongside Maven shows 
the project's commitment to developer experience improvements.

4. **ANSI Compliance Priority**: Growing emphasis on ANSI SQL compliance, 
especially with Spark 4.0 making ANSI mode the default.

5. **Performance Optimization**: Multiple PRs focused on reducing overhead and 
improving performance, particularly for Delta Lake operations and broadcast 
joins.

## Good First Issues

### #10134: ANSI Mode Support
**Skills needed**: Scala, SQL expressions, Spark internals
**Why it's good**: Well-documented issue with clear task breakdown. Perfect for 
understanding Spark's expression system and type casting. Each subtask is 
self-contained.

### #11501: Docker Dependencies Caching
**Skills needed**: Docker, CI/CD, Maven
**Why it's good**: Infrastructure improvement with clear requirements. Good 
introduction to Gluten's CI system and build optimization.

### #11511: CentOS 9 CI Support
**Skills needed**: GitHub Actions, Docker, Linux
**Why it's good**: Straightforward infrastructure task that helps understand 
the project's CI/CD pipeline and testing infrastructure.

### #11383: Velox Bloom Filter Configuration
**Skills needed**: Java, Configuration management
**Why it's good**: Simple configuration addition task that introduces Velox 
backend integration patterns.

### #11509: TreeMemoryConsumer Thread Safety
**Skills needed**: Java, Concurrent programming
**Why it's good**: Well-defined problem with existing error examples. Excellent 
for learning about Gluten's memory management architecture.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11584

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to