GitHub user GlutenPerfBot created a discussion: January 30, 2026: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The Gluten project showed strong momentum this week with 42 pull requests merged or opened, covering Velox version updates, Spark 4.x compatibility improvements, infrastructure enhancements, and bug fixes. Key themes include advancing Delta Lake write support, improving CI/CD infrastructure, and expanding test coverage for multiple Spark versions. ## Key Ongoing Projects ### Velox Backend Enhancements - **Daily Velox Version Updates**: @GlutenPerfBot continues maintaining upstream compatibility with regular updates (#11529, #11498) - **Delta Lake Native Write Support**: @zhztheplayer is leading efforts to eliminate C2R overhead in Delta writes (#11419) with native statistics tracking - **Broadcast Hash Join Optimization**: @JkSelf implemented significant BHJ performance improvements showing 1.29x speedup in TPC-DS Q23a (#8931) ### Spark 4.x Compatibility - **Comprehensive Test Suite Addition**: @baibaichen added 426 missing Gluten test suites for Spark 4.0 and 4.1 (#11512) - **Python Version Updates**: @ReemaAlzaid upgraded CI to Python 3.10 for Spark 4.1 compatibility (#11481) - **Unit Test Fixes**: @loudongfeng is addressing remaining Spark 4.0/4.1 test failures (#11520) ### Infrastructure Improvements - **Maven Wrapper Migration**: @yaooqinn standardized CI workflows to use `./build/mvn` wrapper (#11515, #11496) - **CentOS 9 Support**: @ReemaAlzaid added CentOS 9 CI support with 6 new test jobs (#11519) - **Docker Optimization**: @zhouyuan implemented m2 repository caching for faster builds (#11469) ## Priority Items ### Critical Bug Fixes - **Decimal Partition Key Reading**: @zhouyuan fixed decimal partition key support (#11518) - **Window to Aggregate Conversion**: @lgbo-ustc resolved validation issues in window function conversions (#11523) - **Parquet Write Options**: @boneanxs fixed integer overflow in parquet write configurations (#11504) ### Performance Optimizations - **StrictRule Refactoring**: @beliefer achieved 3.52% average performance improvement in TPC-DS benchmarks (#10553) - **Batch Type Identification**: @beliefer optimized batch type identification calls for 1.34% performance gain (#10573) ## Notable Discussions ### Community Building - **Slack Channel Launch**: @zhouyuan announced the new #incubator-gluten Slack channel for real-time community interaction (#8429) ### Technical Challenges - **GPU/CPU Mixed Cluster Scheduling**: @jinchengchenghh outlined requirements for intelligent task scheduling between GPU and CPU nodes (#11524) - **Flink Integration Performance**: @ParyshevSergey raised concerns about Velox4j performance in Flink streaming scenarios (#11508) ## Emerging Trends 1. **Multi-Backend Maturation**: Strong focus on both Velox and ClickHouse backend improvements 2. **Spark Version Parity**: Accelerated efforts to support Spark 4.x features and maintain backward compatibility 3. **Native Format Optimization**: Continued push to eliminate C2R transitions for better performance 4. **Infrastructure Modernization**: Systematic updates to CI/CD, dependency management, and build processes ## Good First Issues ### #11511: CI Migration to CentOS 9 **Skills Needed**: GitHub Actions, Docker, CI/CD **Why Good**: Well-defined scope with existing CentOS 8 implementation as reference. Great introduction to Gluten's testing infrastructure. ### #11509: TreeMemoryConsumer Thread Safety **Skills Needed**: Java concurrency, memory management **Why Good**: Clear problem description with error examples. Excellent for understanding Gluten's memory architecture. ### #11501: Docker Java Dependencies Caching **Skills Needed**: Docker, Maven, Build optimization **Why Good**: Tangible performance impact with clear success metrics. Good entry point into build system improvements. ### #11513: Iceberg input_file_name() Fix **Skills Needed**: File format handling, debugging **Why Good**: Isolated issue with clear expected behavior. Good introduction to file format integration. ### #11400: Spark 4.1 Test Suite Completion **Skills Needed**: Spark internals, testing **Why Good**: Multiple sub-tasks available with varying complexity. Excellent way to learn Spark compatibility requirements. GitHub link: https://github.com/apache/incubator-gluten/discussions/11530 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
