GitHub user GlutenPerfBot created a discussion: January 30, 2026: Weekly Status 
Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten project showed strong momentum this week with 42 pull requests 
merged or opened, covering Velox version updates, Spark 4.x compatibility 
improvements, infrastructure enhancements, and bug fixes. Key themes include 
advancing Delta Lake write support, improving CI/CD infrastructure, and 
expanding test coverage for multiple Spark versions.

## Key Ongoing Projects

### Velox Backend Enhancements
- **Daily Velox Version Updates**: @GlutenPerfBot continues maintaining 
upstream compatibility with regular updates (#11529, #11498)
- **Delta Lake Native Write Support**: @zhztheplayer is leading efforts to 
eliminate C2R overhead in Delta writes (#11419) with native statistics tracking
- **Broadcast Hash Join Optimization**: @JkSelf implemented significant BHJ 
performance improvements showing 1.29x speedup in TPC-DS Q23a (#8931)

### Spark 4.x Compatibility
- **Comprehensive Test Suite Addition**: @baibaichen added 426 missing Gluten 
test suites for Spark 4.0 and 4.1 (#11512)
- **Python Version Updates**: @ReemaAlzaid upgraded CI to Python 3.10 for Spark 
4.1 compatibility (#11481)
- **Unit Test Fixes**: @loudongfeng is addressing remaining Spark 4.0/4.1 test 
failures (#11520)

### Infrastructure Improvements
- **Maven Wrapper Migration**: @yaooqinn standardized CI workflows to use 
`./build/mvn` wrapper (#11515, #11496)
- **CentOS 9 Support**: @ReemaAlzaid added CentOS 9 CI support with 6 new test 
jobs (#11519)
- **Docker Optimization**: @zhouyuan implemented m2 repository caching for 
faster builds (#11469)

## Priority Items

### Critical Bug Fixes
- **Decimal Partition Key Reading**: @zhouyuan fixed decimal partition key 
support (#11518)
- **Window to Aggregate Conversion**: @lgbo-ustc resolved validation issues in 
window function conversions (#11523)
- **Parquet Write Options**: @boneanxs fixed integer overflow in parquet write 
configurations (#11504)

### Performance Optimizations
- **StrictRule Refactoring**: @beliefer achieved 3.52% average performance 
improvement in TPC-DS benchmarks (#10553)
- **Batch Type Identification**: @beliefer optimized batch type identification 
calls for 1.34% performance gain (#10573)

## Notable Discussions

### Community Building
- **Slack Channel Launch**: @zhouyuan announced the new #incubator-gluten Slack 
channel for real-time community interaction (#8429)

### Technical Challenges
- **GPU/CPU Mixed Cluster Scheduling**: @jinchengchenghh outlined requirements 
for intelligent task scheduling between GPU and CPU nodes (#11524)
- **Flink Integration Performance**: @ParyshevSergey raised concerns about 
Velox4j performance in Flink streaming scenarios (#11508)

## Emerging Trends

1. **Multi-Backend Maturation**: Strong focus on both Velox and ClickHouse 
backend improvements
2. **Spark Version Parity**: Accelerated efforts to support Spark 4.x features 
and maintain backward compatibility
3. **Native Format Optimization**: Continued push to eliminate C2R transitions 
for better performance
4. **Infrastructure Modernization**: Systematic updates to CI/CD, dependency 
management, and build processes

## Good First Issues

### #11511: CI Migration to CentOS 9
**Skills Needed**: GitHub Actions, Docker, CI/CD
**Why Good**: Well-defined scope with existing CentOS 8 implementation as 
reference. Great introduction to Gluten's testing infrastructure.

### #11509: TreeMemoryConsumer Thread Safety
**Skills Needed**: Java concurrency, memory management
**Why Good**: Clear problem description with error examples. Excellent for 
understanding Gluten's memory architecture.

### #11501: Docker Java Dependencies Caching
**Skills Needed**: Docker, Maven, Build optimization
**Why Good**: Tangible performance impact with clear success metrics. Good 
entry point into build system improvements.

### #11513: Iceberg input_file_name() Fix
**Skills Needed**: File format handling, debugging
**Why Good**: Isolated issue with clear expected behavior. Good introduction to 
file format integration.

### #11400: Spark 4.1 Test Suite Completion
**Skills Needed**: Spark internals, testing
**Why Good**: Multiple sub-tasks available with varying complexity. Excellent 
way to learn Spark compatibility requirements.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11530

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to