GitHub user GlutenPerfBot created a discussion: October 31, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days saw 32 merged PRs and 20 open PRs, with heavy focus on Velox backend stability, GPU/cuDF acceleration, and daily upstream Velox syncs. Memory-management fixes and shuffle optimizations dominated the bug-fix queue, while new backend proposals (Bolt, Omni) sparked community interest. ## Key Ongoing Projects - **GPU shuffle & cuDF acceleration** – @jinchengchenghh is leading #10934 and #10933 to move locks out of the iterator constructor, enabling 1 GB GPU batches and concurrent CPU/GPU pipeline preparation. - **Daily Velox up-streaming** – @GlutenPerfBot continues the mechanical daily version bumps (#10987, #10985, #10978, #10974, #10962, #10949, #10947, #10946) keeping Gluten in lock-step with facebookincubator/velox. - **Apache maturity checklist** – @zhztheplayer closed #8018 (release-process docs) and #10377, pushing the podling toward graduation. - **BHJ hash-table broadcast** – @JkSelf’s 1.6 k-line PR #8931 (open, 182 comments) promises 1.29× TPC-DS Q23a speed-up and OOM relief for Q24a/b. - **ClickHouse backend refresh** – @lgbo-ustc posted #10728, the monthly ClickHouse version update. ## Priority Items - **Memory regression** – #10937 (open) reports spill can’t be triggered when dynamic off-heap sizing is on; @wForget already has a fix in review (#10936). - **ORC schema mismatch** – long-standing #5638 (open) breaks reads when ORC file lacks column names; @ccat3z’s PR #8862 is stalled awaiting upstream Velox reviews. - **Uniffle shuffle performance** – #10920 (open) flags a 1.5.0 regression vs 1.2.0; @wForget’s buffer-size knob #10922 was merged but more tuning is expected. - **GCC-13 readiness** – #10926 (open) reminds us Velox will soon require GCC-13; CI still on CentOS-7 + GCC-11. ## Notable Discussions - #10929: @WangGuangxin proposes “Bolt”, a ByteDance Velox fork with JIT and OOM hardening, asking for guidance on upstreaming it as a new Gluten backend. - #10188: @wjunLu presents “Omni”, an ARM-optimized backend showing 70 % TPC-DS speed-up; the team offers ARM CI resources if merged. ## Emerging Trends - **GPU-first features** – cuDF validation, GPU shuffle reader, and cudf library pre-installs are landing almost daily, signaling a shift from CPU-only Velox. - **Memory-management churn** – three separate dynamic off-heap fixes in one week suggest the feature is newly stressed in production. - **Multi-backend ecosystem** – with Bolt and Omni proposals, Gluten is evolving into a thin meta-layer over pluggable native engines. - **Graduation push** – documentation PRs for release process, security pages, and PMC lists indicate serious TLP submission prep for Q4 2025. ## Good First Issues - #6814: Add ClickHouse expression MakeYMInterval – pure CH backend, no Velox/C++ needed. - #4730: Implement date_from_unix_date for ClickHouse – similar pattern to existing date functions. - #6807: Add split_part string function for ClickHouse – well-scoped, CH-only. - #6812: Implement SparkPartitionID for ClickHouse – single-row function, good intro to CH function registry. - #6815: Add MapZipWith higher-order function for ClickHouse – slightly larger, but excellent for learning CH’s lambda infrastructure. All issues above are labeled “good first issue”, touch only the ClickHouse backend, and have clear function signatures, making them ideal entry points for new contributors comfortable with Java/Scala or C++ in ClickHouse context. GitHub link: https://github.com/apache/incubator-gluten/discussions/10995 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
