GitHub user GlutenPerfBot created a discussion: September 19, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days delivered 38 merged PRs and 29 open PRs. Velox backend 
dominates with daily version bumps, GPU/cuDF integration, and window-operator 
refactoring. Flink activity centers on Nexmark benchmark coverage and UDF 
support. Iceberg/Delta lake features continue to mature, while build hygiene 
and CI improvements keep pace.

## Key Ongoing Projects
- **Daily Velox sync** – @GlutenPerfBot lands fresh commits every day (#10758, 
#10749, #10730, #10720, #10711) keeping the Velox backend on the bleeding edge.
- **GPU/cuDF connector** – @jinchengchenghh adds single-GPU task locking 
(#10684) and cuDF parquet sink (#10593); validation logic still being refined 
(#10753).
- **Window-operator refactor** – @JkSelf removes SortWindow in favor of 
streaming-only execution (#10734, #10731) with follow-up performance tweaks 
(#10667).
- **Delta Lake PoC** – @zhztheplayer prototypes native write (#10216) and 
deletion-vector read (#10740) for Delta 2.4/3.3.
- **Flink Nexmark sprint** – @shuai-xu & @KevinyhZou add q11-q22 coverage, UDFs 
(`count_char`, `date_format`) and decimal support (#10735, #10757, #10248, 
#10628).

## Priority Items
- **Release 1.5.0 blockers** – @PHILO-HE tracking final back-ports (#10574); 
weekly build fixed by limiting Spark to 3.5 (#10750).
- **Critical correctness fixes** – @lgbo-ustc fixes empty aggregation keys in 
CH GroupLimit (#10746); @Zouxxyy eliminates redundant c2r/r2c for Iceberg 
partition write (#10714).
- **Memory leak & OOM** – #9456 TableScan leak under active investigation; 
#10693 3× table-size blow-up after hash-join needs triage.
- **Flaky tests** – @jinchengchenghh stabilizes CH adaptive-query suite 
(#10756).

## Notable Discussions
- #10188: @wjunLu proposes new ARM-optimized **Omni** backend—community 
feedback invited on GPIP doc.
- #10717: @ryyyyyy1 asks how Flink’s `RowKind` (+I/+U/-U) should map to Velox 
`RowVector`—design open.
- #8429: Slack channel `#incubator-gluten` now live—ASF members and guests 
welcome.

## Emerging Trends
- **Lake-house acceleration** – daily PRs for Iceberg/Delta read/write, 
deletion vectors, column mapping.
- **Micro-performance focus** – hash-table build configs (#10634), lazy-vector 
metrics (#10726), batch-size soft limits (#10661).
- **Build hygiene** – spotless POM enforcement (#10755), Scala format checker 
(#10747), Spark 4.0 CI readiness (#10725).

## Good First Issues
- #6814: implement `MakeYMInterval` for ClickHouse—pure CH backend, no native 
code.
- #4730: add `date_from_unix_date` CH function—follow existing date-function 
pattern.
- #6807: support `split_part` string function in CH—straightforward string 
splitting.
- #6812: expose `SparkPartitionID` in CH backend—reuse Spark’s partition ID.
- #6815: implement `MapZipWith` for CH—entry-level map function, great for 
learning CH UDF framework.

All CH good-first issues need basic C++ and ClickHouse function registration; 
unit tests & docs expected.

GitHub link: https://github.com/apache/incubator-gluten/discussions/10759

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to