*This weekly update is generated by LLMs. You're welcome to join our Github <https://github.com/apache/incubator-gluten/discussions> for in-depth discussions.* Overall Activity Summary
Hello, Gluten community! It's been another productive week with a high level of activity across the repository. We saw dozens of pull requests merged, focusing on core refactoring, dependency updates, and bug fixes. The Velox backend continues to see significant enhancements, and there's a strong, sustained push to mature the Flink integration. Key themes this week include improving ANSI SQL compliance, expanding data lake support to include Paimon, and preparing for the upcoming Spark 3.5.5 upgrade. Key Ongoing Projects Development is buzzing on several major initiatives that are shaping the future of Gluten: - Flink Integration: The Flink backend is rapidly maturing. Recent work includes a large PR by @KevinyhZou <https://github.com/KevinyhZou> to fix UT failures ([GLUTEN-10361][FLINK] Fix UT failure between the conversion of BinaryRowData and StatefulRecord apache/incubator-gluten#10362 <https://github.com/apache/incubator-gluten/pull/10362>), an effort by @shuai-xu <https://github.com/shuai-xu> to add stateful operations support ([Gluten-10317][FLINK] Support state related operation apache/incubator-gluten#10320 <https://github.com/apache/incubator-gluten/pull/10320>), and documentation improvements by @zjuwangg <https://github.com/zjuwangg> to help new users get started ([DOC][FLINK] Improve and correct Flink doc apache/incubator-gluten#10308 <https://github.com/apache/incubator-gluten/pull/10308>). - Apache Paimon Support: Work is underway to integrate Apache Paimon, a popular streaming data lake platform. A significant PR from @liujiayi771 <https://github.com/liujiayi771> ([GLUTEN-9337][VL] Support read Paimon non-PK table apache/incubator-gluten#10186 <https://github.com/apache/incubator-gluten/pull/10186>) adds support for reading Paimon non-PK tables, which will greatly expand Gluten's data source capabilities. - Spark Version Upgrade: The long-running effort to upgrade our Spark dependency continues. The PR to bump to Spark 3.5.5 ([GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 apache/incubator-gluten#8890 <https://github.com/apache/incubator-gluten/pull/8890>) by @jackylee-ch <https://github.com/jackylee-ch> has seen extensive discussion and is a critical step for keeping Gluten aligned with the Spark ecosystem. - ANSI Mode Compliance: We're improving our adherence to the SQL standard. @nimesh1601 <https://github.com/nimesh1601> is leading this charge with PRs to support arithmetic expressions ([GLUTEN-10356][VL] Support arithmetic expression with ansi mode apache/incubator-gluten#10357 <https://github.com/apache/incubator-gluten/pull/10357>) and the abs function ([GLUTEN-10371][VL]Support abs function support with ansi apache/incubator-gluten#10372 <https://github.com/apache/incubator-gluten/pull/10372>) in ANSI mode. Priority Items We encourage our community members to review these important PRs to help us move them forward: - [GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 apache/incubator-gluten#8890 <https://github.com/apache/incubator-gluten/pull/8890> by @jackylee-ch <https://github.com/jackylee-ch>: This major Spark version upgrade has received a lot of feedback and needs final reviews to ensure a smooth transition. - [GLUTEN-9337][VL] Support read Paimon non-PK table apache/incubator-gluten#10186 <https://github.com/apache/incubator-gluten/pull/10186> by @liujiayi771 <https://github.com/liujiayi771>: Adding Paimon support is a huge step. We need eyes on this large PR to validate the implementation and integration. - [GLUTEN-8227][VL] fix: Update sort elimination rules for Hash Aggregate apache/incubator-gluten#9473 <https://github.com/apache/incubator-gluten/pull/9473> by @acvictor <https://github.com/acvictor>: This PR updates sort elimination rules for Hash Aggregate and has generated significant discussion. Your expert review on this complex optimization would be invaluable. - [GLUTEN-10361][FLINK] Fix UT failure between the conversion of BinaryRowData and StatefulRecord apache/incubator-gluten#10362 <https://github.com/apache/incubator-gluten/pull/10362> by @KevinyhZou <https://github.com/KevinyhZou>: This large fix for the Flink backend is crucial for stability. We'd appreciate help with testing and review. Notable Discussions Several important conversations are happening that will influence the project's direction: - Add a new backend: Omni apache/incubator-gluten#10188 <https://github.com/apache/incubator-gluten/discussions/10188>: A proposal by @wjunLu <https://github.com/wjunLu> to add Omni, a new ARM-optimized backend. This is a major strategic discussion, and we invite everyone to share their thoughts on expanding Gluten's hardware support. - Gluten 1.5.0 Release apache/incubator-gluten#10327 <https://github.com/apache/incubator-gluten/discussions/10327>: Release Manager @PHILO-HE <https://github.com/PHILO-HE> has announced the plan for the Gluten 1.5.0 release, with a code freeze targeted for August 15, 2025. Please review the timeline and prioritize your contributions. - Velox, GCS and cURL | CURL error [77]=Problem with the SSL CA cert (path? access rights?) apache/incubator-gluten#9946 <https://github.com/apache/incubator-gluten/discussions/9946>: A user is facing an SSL certificate issue when using the Velox backend with GCS. If you have experience with GCS and cURL, your insights could help @A-Mongy <https://github.com/A-Mongy> and others resolve this problem. Emerging Trends Based on this week's activity, we're observing a few key trends: - Dependency Upgrades as a Continuous Process: Automated PRs from @GlutenPerfBot <https://github.com/GlutenPerfBot> and @kyligence-git <https://github.com/kyligence-git> are keeping our Velox and ClickHouse dependencies fresh ([GLUTEN-6887][VL] Daily Update Velox Version (2025_08_06) apache/incubator-gluten#10368 <https://github.com/apache/incubator-gluten/pull/10368>, [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250804) apache/incubator-gluten#10341 <https://github.com/apache/incubator-gluten/pull/10341>). This ensures we benefit from the latest upstream improvements and security fixes. - Focus on Code Quality and Refactoring: A wave of contributions from @beliefer <https://github.com/beliefer> ([GLUTEN-10351] Extract immutable collection as reusable field apache/incubator-gluten#10353 <https://github.com/apache/incubator-gluten/pull/10353>, [GLUTEN-10334] Share the results compare the spark version apache/incubator-gluten#10335 <https://github.com/apache/incubator-gluten/pull/10335>, [GLUTEN-10309][CORE] Improve the implementation of NativeWritePostRule apache/incubator-gluten#10310 <https://github.com/apache/incubator-gluten/pull/10310>) has focused on improving code style, removing redundancies, and enhancing maintainability. This dedication to code health is vital for the project's long-term success. - Expanding Function and Data Type Coverage: We continue to broaden our capabilities with new function support, such as timestampdiff in [GLUTEN-9809][VL] Add timestampdiff support apache/incubator-gluten#9810 <https://github.com/apache/incubator-gluten/pull/9810> by @zml1206 <https://github.com/zml1206> and casting from array to string in [VL] Support cast from array to string apache/incubator-gluten#10300 <https://github.com/apache/incubator-gluten/pull/10300> by @zml1206 <https://github.com/zml1206>. Good First Issues Looking to make your first contribution to Gluten? These issues are great starting points: - [CH] support expression MakeYMInterval apache/incubator-gluten#6814 <https://github.com/apache/incubator-gluten/issues/6814>: Support the MakeYMInterval expression for the ClickHouse backend. This is a well-scoped task that involves implementing a single function. It's a perfect way to learn the contribution workflow and how Gluten handles date/time intervals. You'll need some C++ knowledge and can follow the patterns from other function implementations. - [CH] support split_part function apache/incubator-gluten#6807 <https://github.com/apache/incubator-gluten/issues/6807>: Add support for the split_part function in the ClickHouse backend. This is another excellent, self-contained task for a new contributor. It will help you understand how string manipulation functions are integrated into the native engine. Basic C++ and string processing skills are all you need to get started. - [CH] support function SparkPartitionID apache/incubator-gluten#6812 <https://github.com/apache/incubator-gluten/issues/6812>: Implement the SparkPartitionID function for the ClickHouse backend. This issue is a great opportunity to learn how Gluten interacts with Spark's internal execution context. It's a small, focused task ideal for a first-time contributor with an interest in the bridge between Spark and native code.
