GitHub user GlutenPerfBot created a discussion: August 15, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary This week saw a high level of activity in the Gluten community, with a strong focus on preparing for the future. Development was dominated by efforts to support **Spark 4.0**, enhance the **Velox backend** with new features, and mature the experimental **Flink integration**. We also saw significant refactoring efforts aimed at simplifying the codebase. Automated dependency updates for both Velox and ClickHouse backends continued to keep the project in sync with its core engines. ## Key Ongoing Projects Several major initiatives are underway, pushing the boundaries of Gluten's capabilities: * **Spark 4.0 Compatibility:** A major community effort is focused on ensuring Gluten is ready for Spark 4.0. Foundational work was recently merged, and contributors are now tackling follow-up tasks like fixing compilation issues in #10434 by @zjuwangg and adding CI checks in #10439 by @PHILO-HE. * **Velox Backend Feature Enhancements:** The Velox backend received significant feature contributions. Key highlights include: * Support for casting complex data types in #10443 by @kevinwilfong. * Adding support for Iceberg's copy-on-write operations in #10458 by @jinchengchenghh. * Enabling reading for Paimon non-primary-key tables in #10186 by @liujiayi771. * **Broadcast Hash Join (BHJ) Optimization:** The long-running effort to optimize BHJ performance in the Velox backend continues in #8931 by @JkSelf. This highly anticipated PR aims to bring significant performance gains and has gathered extensive community feedback. * **Flink Integration:** The integration with Apache Flink is maturing with #10320 by @shuai-xu, which adds support for stateful operations, a critical feature for stream processing. ## Priority Items We encourage the community to review and provide feedback on these important pull requests: * **Codebase Simplification:** A large-scale refactoring is proposed in #10453 by @marin-ma to remove hardware accelerator support. This change impacts many files across the project and requires careful review to ensure a smooth transition. * **Complex Type Casting:** The PR #10443 by @kevinwilfong introduces a powerful new capability for handling complex types. Community review would be valuable to validate the implementation across various use cases. * **Common Subexpression Elimination:** The draft PR #9999 by @wypb aims to apply common subexpression elimination to optimize Spark logical plans. This is a complex but impactful optimization that would benefit from community input on its design and path forward. ## Notable Discussions Several important conversations are happening that will shape the future of Gluten: * **Proposal for a New "Omni" Backend:** A new discussion was started in #10188 by @wjunLu proposing the addition of an "Omni" backend, specifically optimized for ARM architectures. This is a significant proposal, and we invite the community to share their thoughts. * **Dropping Spark 3.2 Support:** As work on Spark 4.0 progresses, a discussion is underway in #10407 about dropping support for Spark 3.2. This is a key roadmap decision that will help streamline maintenance and development. * **Flink Support:** The general discussion on Flink integration continues in #8849, serving as a central point for design and implementation questions as this exciting new feature evolves. ## Emerging Trends Based on this week's activity, we've identified several key trends: * **Preparing for the Next Generation of Spark:** The concentrated effort on Spark 4.0 compatibility indicates a forward-looking approach, ensuring Gluten users can seamlessly upgrade to the latest Spark version. * **Expanding the Data Ecosystem:** With new support for Iceberg, Paimon, and complex types, Gluten is strengthening its position as a versatile accelerator for a wide range of modern data formats and structures. * **Growing Interest in ARM:** The proposal for an ARM-native backend and related user questions suggest an emerging demand for high-performance data processing on ARM-based infrastructure. ## Good First Issues Looking to make your first contribution to Gluten? These issues are well-defined and a great way to get started: * **#6814**: Implement the `MakeYMInterval` expression for the ClickHouse backend. * **#4730**: Add support for the `date_from_unix_date` function in the ClickHouse backend. * **#6807**: Implement the `split_part` function for the ClickHouse backend. * **#6812**: Add support for the `SparkPartitionID` function in the ClickHouse backend. These tasks are a great entry point for contributors with some C++ and Scala/Java experience. They involve implementing a single, well-scoped function, allowing you to get familiar with the codebase without needing to understand the entire system. Welcome to the community GitHub link: https://github.com/apache/incubator-gluten/discussions/10459 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
