[D] August 15, 2025: Weekly Status Update in Gluten [incubator-gluten]

via GitHub Fri, 15 Aug 2025 13:17:59 -0700


GitHub user GlutenPerfBot created a discussion: August 15, 2025: Weekly Status 
Update in Gluten


*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
This week saw a high level of activity in the Gluten community, with a strong 
focus on preparing for the future. Development was dominated by efforts to 
support **Spark 4.0**, enhance the **Velox backend** with new features, and 
mature the experimental **Flink integration**. We also saw significant 
refactoring efforts aimed at simplifying the codebase. Automated dependency 
updates for both Velox and ClickHouse backends continued to keep the project in 
sync with its core engines.

## Key Ongoing Projects
Several major initiatives are underway, pushing the boundaries of Gluten's 
capabilities:

*   **Spark 4.0 Compatibility:** A major community effort is focused on 
ensuring Gluten is ready for Spark 4.0. Foundational work was recently merged, 
and contributors are now tackling follow-up tasks like fixing compilation 
issues in #10434 by @zjuwangg and adding CI checks in #10439 by @PHILO-HE.
*   **Velox Backend Feature Enhancements:** The Velox backend received 
significant feature contributions. Key highlights include:
    *   Support for casting complex data types in #10443 by @kevinwilfong.
    *   Adding support for Iceberg's copy-on-write operations in #10458 by 
@jinchengchenghh.
    *   Enabling reading for Paimon non-primary-key tables in #10186 by 
@liujiayi771.
*   **Broadcast Hash Join (BHJ) Optimization:** The long-running effort to 
optimize BHJ performance in the Velox backend continues in #8931 by @JkSelf. 
This highly anticipated PR aims to bring significant performance gains and has 
gathered extensive community feedback.
*   **Flink Integration:** The integration with Apache Flink is maturing with 
#10320 by @shuai-xu, which adds support for stateful operations, a critical 
feature for stream processing.

## Priority Items
We encourage the community to review and provide feedback on these important 
pull requests:

*   **Codebase Simplification:** A large-scale refactoring is proposed in 
#10453 by @marin-ma to remove hardware accelerator support. This change impacts 
many files across the project and requires careful review to ensure a smooth 
transition.
*   **Complex Type Casting:** The PR #10443 by @kevinwilfong introduces a 
powerful new capability for handling complex types. Community review would be 
valuable to validate the implementation across various use cases.
*   **Common Subexpression Elimination:** The draft PR #9999 by @wypb aims to 
apply common subexpression elimination to optimize Spark logical plans. This is 
a complex but impactful optimization that would benefit from community input on 
its design and path forward.

## Notable Discussions
Several important conversations are happening that will shape the future of 
Gluten:

*   **Proposal for a New "Omni" Backend:** A new discussion was started in 
#10188 by @wjunLu proposing the addition of an "Omni" backend, specifically 
optimized for ARM architectures. This is a significant proposal, and we invite 
the community to share their thoughts.
*   **Dropping Spark 3.2 Support:** As work on Spark 4.0 progresses, a 
discussion is underway in #10407 about dropping support for Spark 3.2. This is 
a key roadmap decision that will help streamline maintenance and development.
*   **Flink Support:** The general discussion on Flink integration continues in 
#8849, serving as a central point for design and implementation questions as 
this exciting new feature evolves.

## Emerging Trends
Based on this week's activity, we've identified several key trends:

*   **Preparing for the Next Generation of Spark:** The concentrated effort on 
Spark 4.0 compatibility indicates a forward-looking approach, ensuring Gluten 
users can seamlessly upgrade to the latest Spark version.
*   **Expanding the Data Ecosystem:** With new support for Iceberg, Paimon, and 
complex types, Gluten is strengthening its position as a versatile accelerator 
for a wide range of modern data formats and structures.
*   **Growing Interest in ARM:** The proposal for an ARM-native backend and 
related user questions suggest an emerging demand for high-performance data 
processing on ARM-based infrastructure.

## Good First Issues
Looking to make your first contribution to Gluten? These issues are 
well-defined and a great way to get started:

*   **#6814**: Implement the `MakeYMInterval` expression for the ClickHouse 
backend.
*   **#4730**: Add support for the `date_from_unix_date` function in the 
ClickHouse backend.
*   **#6807**: Implement the `split_part` function for the ClickHouse backend.
*   **#6812**: Add support for the `SparkPartitionID` function in the 
ClickHouse backend.

These tasks are a great entry point for contributors with some C++ and 
Scala/Java experience. They involve implementing a single, well-scoped 
function, allowing you to get familiar with the codebase without needing to 
understand the entire system. Welcome to the community

GitHub link: https://github.com/apache/incubator-gluten/discussions/10459

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[D] August 15, 2025: Weekly Status Update in Gluten [incubator-gluten]

Reply via email to