GitHub user GlutenPerfBot created a discussion: January 23, 2026: Weekly Status 
Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Apache Gluten project has been highly active over the past 7 days with 47 
pull requests and 24 issues, showing strong momentum across multiple fronts. 
Key themes include Spark 4.1 compatibility improvements, Delta Lake 
optimizations, ANSI mode support, and infrastructure enhancements. The 
community is actively addressing performance issues, compatibility challenges, 
and expanding platform support.

## Key Ongoing Projects

**Spark 4.1 Compatibility & ANSI Mode Support**
- Major effort led by @baibaichen to track and fix Spark 4.1.x unit test 
failures (#11400), with significant progress on union partitioning support and 
Python 3.10 migration
- @malinjawi is driving ANSI mode support implementation (#10134) with 
string-to-boolean casting completed (#11437)
- @ReemaAlzaid updated CI to Python 3.10 for Spark 4.1 compatibility (#11481)

**Delta Lake Performance Optimizations**
- @zhztheplayer is leading comprehensive Delta write optimizations (#10215) 
with multiple PRs addressing native statistics tracking (#11419), redundant 
C2R2C transitions (#11478), and V1 fallback writers (#11479)
- @FelixYBW identified and is addressing OOM issues during Delta table creation 
(#11421)

**Infrastructure & Build Improvements**
- @zhouyuan added RHEL 9.7 support (#11460) and Maven version upgrades (#11431, 
#11470)
- @jinchengchenghh simplified cuDF build configuration (#11407) and fixed CI 
reporting issues (#11462)
- Daily Velox version updates are being maintained by @GlutenPerfBot

## Priority Items

**Critical Performance Issues**
- #8417: TPC-DS Q72 performance regression needs immediate attention - Gluten 
showing significantly worse performance than vanilla Spark
- #11421: OOM during Delta table creation with clustering - affects production 
workloads
- #11397: Hive partitioned output failures with exit code 134

**Compatibility & Stability**
- #11406: Flaky test in UnsafeColumnarBuildSideRelationTest causing CI 
instability
- #11473: JVM crashes in JavaRssClient release - production stability issue
- #11369: ClickHouse adaptive query execution test failures

**Build & Platform Support**
- #11390: ARM64 compiler flag inconsistencies causing xsimd initialization 
errors
- #11445: Dynamic off-heap sizing configuration issues

## Notable Discussions

**Performance Benchmarking Discussion**
- #11463: Community member @shadowmmu seeking TPC-DS 1TB benchmarking results 
for non-partitioned Delta tables with Velox backend, highlighting need for more 
comprehensive performance data on lakehouse formats

**Brand Compliance Initiative**
- #11438: @weiting-chen leading effort to ensure proper "Apache Gluten" 
branding across vendor documentation, with several vendors already updated

## Emerging Trends

1. **Lakehouse Format Maturation**: Significant focus on Delta Lake 
optimizations with native write path improvements and performance tuning
2. **Spark 4.1 Migration**: Major push towards full Spark 4.1 compatibility 
with ANSI mode support becoming critical
3. **Platform Diversification**: Expanding support for ARM64, RHEL 9.7, and 
different JDK versions
4. **Memory Management**: Multiple issues around off-heap memory management and 
OOM prevention
5. **CI/CD Stabilization**: Ongoing efforts to reduce flaky tests and improve 
build reliability

## Good First Issues

**#11400: Spark 4.1 Unit Test Fixes**
- **What**: Help fix failing Spark 4.1 unit tests across various components
- **Skills Needed**: Scala, Spark internals, test debugging
- **Why Good**: Well-documented with specific test cases, clear acceptance 
criteria, and community support

**#10134: ANSI Mode Support Implementation**
- **What**: Implement ANSI-compliant casting functions for string-to-timestamp, 
string-to-date conversions
- **Skills Needed**: C++, Velox functions, Spark SQL semantics
- **Why Good**: Clear specifications with Spark reference implementations, good 
for learning Velox function development

**#11417: Qualification Tool Enhancement**
- **What**: Add lakehouse format detection (Iceberg, Delta, Hudi, Paimon) to 
the qualification tool
- **Skills Needed**: Java, pattern matching, data format knowledge
- **Why Good**: Self-contained tool enhancement with clear requirements and 
test coverage expectations

**#8984: Varchar to Timestamp Casting Corner Cases**
- **What**: Fix edge cases in varchar to timestamp casting uncovered by Spark 
tests
- **Skills Needed**: C++, date/time parsing, edge case analysis
- **Why Good**: Focused scope with existing test cases, good for understanding 
Velox type system

GitHub link: https://github.com/apache/incubator-gluten/discussions/11483

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to