GitHub user GlutenPerfBot created a discussion: January 23, 2026: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The Apache Gluten project has been highly active over the past 7 days with 47 pull requests and 24 issues, showing strong momentum across multiple fronts. Key themes include Spark 4.1 compatibility improvements, Delta Lake optimizations, ANSI mode support, and infrastructure enhancements. The community is actively addressing performance issues, compatibility challenges, and expanding platform support. ## Key Ongoing Projects **Spark 4.1 Compatibility & ANSI Mode Support** - Major effort led by @baibaichen to track and fix Spark 4.1.x unit test failures (#11400), with significant progress on union partitioning support and Python 3.10 migration - @malinjawi is driving ANSI mode support implementation (#10134) with string-to-boolean casting completed (#11437) - @ReemaAlzaid updated CI to Python 3.10 for Spark 4.1 compatibility (#11481) **Delta Lake Performance Optimizations** - @zhztheplayer is leading comprehensive Delta write optimizations (#10215) with multiple PRs addressing native statistics tracking (#11419), redundant C2R2C transitions (#11478), and V1 fallback writers (#11479) - @FelixYBW identified and is addressing OOM issues during Delta table creation (#11421) **Infrastructure & Build Improvements** - @zhouyuan added RHEL 9.7 support (#11460) and Maven version upgrades (#11431, #11470) - @jinchengchenghh simplified cuDF build configuration (#11407) and fixed CI reporting issues (#11462) - Daily Velox version updates are being maintained by @GlutenPerfBot ## Priority Items **Critical Performance Issues** - #8417: TPC-DS Q72 performance regression needs immediate attention - Gluten showing significantly worse performance than vanilla Spark - #11421: OOM during Delta table creation with clustering - affects production workloads - #11397: Hive partitioned output failures with exit code 134 **Compatibility & Stability** - #11406: Flaky test in UnsafeColumnarBuildSideRelationTest causing CI instability - #11473: JVM crashes in JavaRssClient release - production stability issue - #11369: ClickHouse adaptive query execution test failures **Build & Platform Support** - #11390: ARM64 compiler flag inconsistencies causing xsimd initialization errors - #11445: Dynamic off-heap sizing configuration issues ## Notable Discussions **Performance Benchmarking Discussion** - #11463: Community member @shadowmmu seeking TPC-DS 1TB benchmarking results for non-partitioned Delta tables with Velox backend, highlighting need for more comprehensive performance data on lakehouse formats **Brand Compliance Initiative** - #11438: @weiting-chen leading effort to ensure proper "Apache Gluten" branding across vendor documentation, with several vendors already updated ## Emerging Trends 1. **Lakehouse Format Maturation**: Significant focus on Delta Lake optimizations with native write path improvements and performance tuning 2. **Spark 4.1 Migration**: Major push towards full Spark 4.1 compatibility with ANSI mode support becoming critical 3. **Platform Diversification**: Expanding support for ARM64, RHEL 9.7, and different JDK versions 4. **Memory Management**: Multiple issues around off-heap memory management and OOM prevention 5. **CI/CD Stabilization**: Ongoing efforts to reduce flaky tests and improve build reliability ## Good First Issues **#11400: Spark 4.1 Unit Test Fixes** - **What**: Help fix failing Spark 4.1 unit tests across various components - **Skills Needed**: Scala, Spark internals, test debugging - **Why Good**: Well-documented with specific test cases, clear acceptance criteria, and community support **#10134: ANSI Mode Support Implementation** - **What**: Implement ANSI-compliant casting functions for string-to-timestamp, string-to-date conversions - **Skills Needed**: C++, Velox functions, Spark SQL semantics - **Why Good**: Clear specifications with Spark reference implementations, good for learning Velox function development **#11417: Qualification Tool Enhancement** - **What**: Add lakehouse format detection (Iceberg, Delta, Hudi, Paimon) to the qualification tool - **Skills Needed**: Java, pattern matching, data format knowledge - **Why Good**: Self-contained tool enhancement with clear requirements and test coverage expectations **#8984: Varchar to Timestamp Casting Corner Cases** - **What**: Fix edge cases in varchar to timestamp casting uncovered by Spark tests - **Skills Needed**: C++, date/time parsing, edge case analysis - **Why Good**: Focused scope with existing test cases, good for understanding Velox type system GitHub link: https://github.com/apache/incubator-gluten/discussions/11483 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
