GitHub user GlutenPerfBot created a discussion: September 08, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days saw 38 merged PRs and 29 open PRs across Velox, ClickHouse, 
Flink and build/infra areas. Velox backend dominated with daily version bumps, 
shuffle-read optimizations, and new function enablements. Flink activity surged 
(7 PRs) around Nexmark benchmark support, while Iceberg/Delta lake features and 
GPU/cuDF connectors are gaining momentum. Community is preparing for Gluten 
1.5.0 release with documentation clean-ups and CI improvements.

## Key Ongoing Projects
- **Shuffle-read performance overhaul** – @marin-ma merged #10499 to coalesce 
small batches and eliminate `VeloxResizeBatches`, cutting deserialize overhead 
in sort-based shuffle.
- **Iceberg write & partition support** – @jinchengchenghh landed #10497 for 
partition write; #10285 adds Iceberg functions (still fallback).
- **Delta Lake read/write PoC** – @zhztheplayer opened #10216 for Delta 3.3.1 
write and #10639 for DV-enabled TPC-DS table generation.
- **Flink Nexmark completeness** – @shuai-xu & @KevinyhZou enabled q11-q21, 
processing-time windows, and Java-17 compatibility (#10548, #10631, #10572).
- **cuDF/GPU connector** – @jinchengchenghh introduced #10622 for cuDF parquet 
reader and #10621 for GPU connector tracking.
- **Spark 4.0 CI readiness** – @zhouyuan enabled TPC-DS tests on Spark-400 
(#10633) and moved Spark-3.2 tests to nightly (#8961).

## Priority Items
- **Release blockers for 1.5.0** (#10574) – @PHILO-HE tracking open ports: 
#10603 (config doc tidy-up by @zjuwangg) and #10641 (arrow url typo by 
@liujiayi771).
- **Critical bug fixes** – #10644 (NumberFormatException in window bounds) by 
@mingyi6666; #10635 (file INSTALL permission) by @beliefer; #10511 (Delta 
column-mapping wrong results) by @sezruby.
- **Memory OOM & stability** – #7249 global-memory OOM during spill needs 
Velox-side fix; #9846 deadlock in TableScan preload; #9845 ARM core-dump under 
investigation.

## Notable Discussions
- #10406: @ryyyyyy1 asks about SARG push-down in Flink and early Velox4j 
initialization—community feedback wanted.
- #10214: long thread on high shuffle deserialize time; #10499 already merged 
but further tuning invited.
- #8018: stage-level ResourceProfile auto-adjust design accepted; POC code to 
be contributed by @zjuwangg.

## Emerging Trends
1. **Lake-house acceleration** – Iceberg & Delta PRs appearing daily; 
deletion-vector and column-mapping fixes show production readiness push.
2. **Flink-first development** – Nexmark benchmark now drives Flink function 
parity (UDFs, decimal, time attributes).
3. **GPU/cuDF integration** – Velox cuDF parquet connector merged; GPU memory 
config and connector stubs checked in.
4. **Micro-performance focus** – repeated `identifyBatchType` (#10649), 
`StrictRule` simplification (#10553), and hash-table build configs (#10634) all 
target driver-side CPU.
5. **Documentation & CI hygiene** – daily Velox bumps automated, Spark-3.2 
demoted to nightly, Maven enforcer rules relaxed (#10536).

## Good First Issues
- #6814: add ClickHouse expression `MakeYMInterval` – pure CH backend, no 
native changes.
- #4730: implement `date_from_unix_date` for ClickHouse – follow existing date 
function pattern.
- #6807: support `split_part` string function in ClickHouse – straightforward 
string splitting logic.
- #6812: expose `SparkPartitionID` in ClickHouse backend – reuse Spark’s 
partition ID.
- #6815: implement `MapZipWith` for ClickHouse – entry-level map function, good 
for learning CH UDF framework.

All CH good-first issues need basic C++ and ClickHouse function registration 
knowledge; unit tests and documentation updates are expected.

GitHub link: https://github.com/apache/incubator-gluten/discussions/10652

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to