GitHub user GlutenPerfBot created a discussion: September 19, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days delivered 38 merged PRs and 29 open PRs. Velox backend dominates with daily version bumps, GPU/cuDF integration, and window-operator refactoring. Flink activity centers on Nexmark benchmark coverage and UDF support. Iceberg/Delta lake features continue to mature, while build hygiene and CI improvements keep pace. ## Key Ongoing Projects - **Daily Velox sync** – @GlutenPerfBot lands fresh commits every day (#10758, #10749, #10730, #10720, #10711) keeping the Velox backend on the bleeding edge. - **GPU/cuDF connector** – @jinchengchenghh adds single-GPU task locking (#10684) and cuDF parquet sink (#10593); validation logic still being refined (#10753). - **Window-operator refactor** – @JkSelf removes SortWindow in favor of streaming-only execution (#10734, #10731) with follow-up performance tweaks (#10667). - **Delta Lake PoC** – @zhztheplayer prototypes native write (#10216) and deletion-vector read (#10740) for Delta 2.4/3.3. - **Flink Nexmark sprint** – @shuai-xu & @KevinyhZou add q11-q22 coverage, UDFs (`count_char`, `date_format`) and decimal support (#10735, #10757, #10248, #10628). ## Priority Items - **Release 1.5.0 blockers** – @PHILO-HE tracking final back-ports (#10574); weekly build fixed by limiting Spark to 3.5 (#10750). - **Critical correctness fixes** – @lgbo-ustc fixes empty aggregation keys in CH GroupLimit (#10746); @Zouxxyy eliminates redundant c2r/r2c for Iceberg partition write (#10714). - **Memory leak & OOM** – #9456 TableScan leak under active investigation; #10693 3× table-size blow-up after hash-join needs triage. - **Flaky tests** – @jinchengchenghh stabilizes CH adaptive-query suite (#10756). ## Notable Discussions - #10188: @wjunLu proposes new ARM-optimized **Omni** backend—community feedback invited on GPIP doc. - #10717: @ryyyyyy1 asks how Flink’s `RowKind` (+I/+U/-U) should map to Velox `RowVector`—design open. - #8429: Slack channel `#incubator-gluten` now live—ASF members and guests welcome. ## Emerging Trends - **Lake-house acceleration** – daily PRs for Iceberg/Delta read/write, deletion vectors, column mapping. - **Micro-performance focus** – hash-table build configs (#10634), lazy-vector metrics (#10726), batch-size soft limits (#10661). - **Build hygiene** – spotless POM enforcement (#10755), Scala format checker (#10747), Spark 4.0 CI readiness (#10725). ## Good First Issues - #6814: implement `MakeYMInterval` for ClickHouse—pure CH backend, no native code. - #4730: add `date_from_unix_date` CH function—follow existing date-function pattern. - #6807: support `split_part` string function in CH—straightforward string splitting. - #6812: expose `SparkPartitionID` in CH backend—reuse Spark’s partition ID. - #6815: implement `MapZipWith` for CH—entry-level map function, great for learning CH UDF framework. All CH good-first issues need basic C++ and ClickHouse function registration; unit tests & docs expected. GitHub link: https://github.com/apache/incubator-gluten/discussions/10759 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
