*This weekly update is generated by LLMs. You're welcome to join our Github
<https://github.com/apache/incubator-gluten/discussions> for in-depth
discussions.*
Overall Activity Summary

Hello, Gluten community! It's been another productive week with a high
level of activity across the repository. We saw dozens of pull requests
merged, focusing on core refactoring, dependency updates, and bug fixes.
The Velox backend continues to see significant enhancements, and there's a
strong, sustained push to mature the Flink integration. Key themes this
week include improving ANSI SQL compliance, expanding data lake support to
include Paimon, and preparing for the upcoming Spark 3.5.5 upgrade.
Key Ongoing Projects

Development is buzzing on several major initiatives that are shaping the
future of Gluten:

   - Flink Integration: The Flink backend is rapidly maturing. Recent work
   includes a large PR by @KevinyhZou <https://github.com/KevinyhZou> to
   fix UT failures ([GLUTEN-10361][FLINK] Fix UT failure between the
   conversion of BinaryRowData and StatefulRecord
    apache/incubator-gluten#10362
   <https://github.com/apache/incubator-gluten/pull/10362>), an effort by
   @shuai-xu <https://github.com/shuai-xu> to add stateful operations
   support ([Gluten-10317][FLINK] Support state related operation
    apache/incubator-gluten#10320
   <https://github.com/apache/incubator-gluten/pull/10320>), and
   documentation improvements by @zjuwangg <https://github.com/zjuwangg> to
   help new users get started ([DOC][FLINK] Improve and correct Flink doc
    apache/incubator-gluten#10308
   <https://github.com/apache/incubator-gluten/pull/10308>).
   - Apache Paimon Support: Work is underway to integrate Apache Paimon, a
   popular streaming data lake platform. A significant PR from @liujiayi771
   <https://github.com/liujiayi771> ([GLUTEN-9337][VL] Support read Paimon
   non-PK table apache/incubator-gluten#10186
   <https://github.com/apache/incubator-gluten/pull/10186>) adds support
   for reading Paimon non-PK tables, which will greatly expand Gluten's data
   source capabilities.
   - Spark Version Upgrade: The long-running effort to upgrade our Spark
   dependency continues. The PR to bump to Spark 3.5.5 ([GLUTEN-8889][CORE]
   Bump Spark version from 3.5.2 to 3.5.5 apache/incubator-gluten#8890
   <https://github.com/apache/incubator-gluten/pull/8890>) by @jackylee-ch
   <https://github.com/jackylee-ch> has seen extensive discussion and is a
   critical step for keeping Gluten aligned with the Spark ecosystem.
   - ANSI Mode Compliance: We're improving our adherence to the SQL
   standard. @nimesh1601 <https://github.com/nimesh1601> is leading this
   charge with PRs to support arithmetic expressions ([GLUTEN-10356][VL]
   Support arithmetic expression with ansi mode
    apache/incubator-gluten#10357
   <https://github.com/apache/incubator-gluten/pull/10357>) and the
abs function
   ([GLUTEN-10371][VL]Support abs function support with ansi
    apache/incubator-gluten#10372
   <https://github.com/apache/incubator-gluten/pull/10372>) in ANSI mode.

Priority Items

We encourage our community members to review these important PRs to help us
move them forward:

   - [GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5
    apache/incubator-gluten#8890
   <https://github.com/apache/incubator-gluten/pull/8890> by @jackylee-ch
   <https://github.com/jackylee-ch>: This major Spark version upgrade has
   received a lot of feedback and needs final reviews to ensure a smooth
   transition.
   - [GLUTEN-9337][VL] Support read Paimon non-PK table
    apache/incubator-gluten#10186
   <https://github.com/apache/incubator-gluten/pull/10186> by @liujiayi771
   <https://github.com/liujiayi771>: Adding Paimon support is a huge step.
   We need eyes on this large PR to validate the implementation and
   integration.
   - [GLUTEN-8227][VL] fix: Update sort elimination rules for Hash Aggregate
    apache/incubator-gluten#9473
   <https://github.com/apache/incubator-gluten/pull/9473> by @acvictor
   <https://github.com/acvictor>: This PR updates sort elimination rules
   for Hash Aggregate and has generated significant discussion. Your expert
   review on this complex optimization would be invaluable.
   - [GLUTEN-10361][FLINK] Fix UT failure between the conversion of
   BinaryRowData and StatefulRecord apache/incubator-gluten#10362
   <https://github.com/apache/incubator-gluten/pull/10362> by @KevinyhZou
   <https://github.com/KevinyhZou>: This large fix for the Flink backend is
   crucial for stability. We'd appreciate help with testing and review.

Notable Discussions

Several important conversations are happening that will influence the
project's direction:

   - Add a new backend: Omni apache/incubator-gluten#10188
   <https://github.com/apache/incubator-gluten/discussions/10188>: A
   proposal by @wjunLu <https://github.com/wjunLu> to add Omni, a new
   ARM-optimized backend. This is a major strategic discussion, and we invite
   everyone to share their thoughts on expanding Gluten's hardware support.
   - Gluten 1.5.0 Release apache/incubator-gluten#10327
   <https://github.com/apache/incubator-gluten/discussions/10327>: Release
   Manager @PHILO-HE <https://github.com/PHILO-HE> has announced the plan
   for the Gluten 1.5.0 release, with a code freeze targeted for August 15,
   2025. Please review the timeline and prioritize your contributions.
   - Velox, GCS and cURL | CURL error [77]=Problem with the SSL CA cert
   (path? access rights?) apache/incubator-gluten#9946
   <https://github.com/apache/incubator-gluten/discussions/9946>: A user is
   facing an SSL certificate issue when using the Velox backend with GCS. If
   you have experience with GCS and cURL, your insights could help @A-Mongy
   <https://github.com/A-Mongy> and others resolve this problem.

Emerging Trends

Based on this week's activity, we're observing a few key trends:

   - Dependency Upgrades as a Continuous Process: Automated PRs from
   @GlutenPerfBot <https://github.com/GlutenPerfBot> and @kyligence-git
   <https://github.com/kyligence-git> are keeping our Velox and ClickHouse
   dependencies fresh ([GLUTEN-6887][VL] Daily Update Velox Version
   (2025_08_06) apache/incubator-gluten#10368
   <https://github.com/apache/incubator-gluten/pull/10368>,
[GLUTEN-1632][CH]Daily
   Update Clickhouse Version (20250804) apache/incubator-gluten#10341
   <https://github.com/apache/incubator-gluten/pull/10341>). This ensures
   we benefit from the latest upstream improvements and security fixes.
   - Focus on Code Quality and Refactoring: A wave of contributions from
   @beliefer <https://github.com/beliefer> ([GLUTEN-10351] Extract
   immutable collection as reusable field apache/incubator-gluten#10353
   <https://github.com/apache/incubator-gluten/pull/10353>, [GLUTEN-10334]
   Share the results compare the spark version apache/incubator-gluten#10335
   <https://github.com/apache/incubator-gluten/pull/10335>,
[GLUTEN-10309][CORE]
   Improve the implementation of NativeWritePostRule
    apache/incubator-gluten#10310
   <https://github.com/apache/incubator-gluten/pull/10310>) has focused on
   improving code style, removing redundancies, and enhancing maintainability.
   This dedication to code health is vital for the project's long-term success.
   - Expanding Function and Data Type Coverage: We continue to broaden our
   capabilities with new function support, such as timestampdiff in
[GLUTEN-9809][VL]
   Add timestampdiff support apache/incubator-gluten#9810
   <https://github.com/apache/incubator-gluten/pull/9810> by @zml1206
   <https://github.com/zml1206> and casting from array to string in [VL]
   Support cast from array to string apache/incubator-gluten#10300
   <https://github.com/apache/incubator-gluten/pull/10300> by @zml1206
   <https://github.com/zml1206>.

Good First Issues

Looking to make your first contribution to Gluten? These issues are great
starting points:

   - [CH] support expression MakeYMInterval apache/incubator-gluten#6814
   <https://github.com/apache/incubator-gluten/issues/6814>: Support the
   MakeYMInterval expression for the ClickHouse backend. This is a
   well-scoped task that involves implementing a single function. It's a
   perfect way to learn the contribution workflow and how Gluten handles
   date/time intervals. You'll need some C++ knowledge and can follow the
   patterns from other function implementations.
   - [CH] support split_part function apache/incubator-gluten#6807
   <https://github.com/apache/incubator-gluten/issues/6807>: Add support
   for the split_part function in the ClickHouse backend. This is another
   excellent, self-contained task for a new contributor. It will help you
   understand how string manipulation functions are integrated into the native
   engine. Basic C++ and string processing skills are all you need to get
   started.
   - [CH] support function SparkPartitionID  apache/incubator-gluten#6812
   <https://github.com/apache/incubator-gluten/issues/6812>: Implement the
   SparkPartitionID function for the ClickHouse backend. This issue is a
   great opportunity to learn how Gluten interacts with Spark's internal
   execution context. It's a small, focused task ideal for a first-time
   contributor with an interest in the bridge between Spark and native code.

Reply via email to