GitHub user GlutenPerfBot created a discussion: January 02, 2026: Weekly Status 
Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The Gluten community closed 7 PRs and advanced 19 open ones, with heavy focus 
on Spark 4.1 readiness, Velox daily bumps, and CI/build hygiene. Spark-3.2 
deprecation and ARM/neoverse-v2 build fixes are now tracked issues.

## Key Ongoing Projects
- **Spark 4.1 support** – @baibaichen opened the shim layer (#11347) and 
compatibility mega-fix (#11313, now merged) covering geospatial types, 
commons-collections 4.x, and test-environment alignment.
- **Velox version cadence** – daily bumps by @jinchengchenghh (#11348, #11337, 
#11349) keep Gluten pinned to latest IBM/velox commits; all merged within hours.
- **New backend “Bolt”** – @WangGuangxin posted an early 260 k-line draft 
(#11261) introducing a fourth execution engine.
- **SVE performance** – @chiranmoyh’s draft (#11045) shows 2× speed-up for 
SparkFloorFunction on Graviton3.

## Priority Items
- **Memory leak in ColumnarPartialProject** (#11336) – reported by 
@liujiayi771; blocks deployments using the popular partial-project flag.
- **HashJoin hang on TPC-DS 10 TB** (#11335) – @xiaojie19852006 sees 3× 
slowdown vs Vanilla Spark; needs profiler attention.
- **CSV/Parquet test failures under Spark 4.0** – umbrella issue #11088 still 
has ~10 suites unclaimed; help wanted.
- **ARM build failure on CentOS 8** (#9858) – GCC-11 lacks neoverse-v2 flag; 
awaits volunteer to patch Velox detection script.

## Notable Discussions
- **PME (Parquet Modular Encryption) support** (#11338) – user @Dormant7 asks 
whether encrypted Parquet files can be read; no answer yet; security-savvy 
contributors welcome.

## Emerging Trends
1. Spark-version velocity: 4.0 tests nearly complete, 4.1 shim landed within 
days of upstream release.
2. Daily Velox pin-and-bump process is now zero-touch via @GlutenPerfBot and 
@jinchengchenghh.
3. Build/CI hygiene trending: clang-tidy (#11120), IWYU (#11287), Maven wrapper 
adoption (#11332) to cut environment drift.
4. Performance work shifting to vectorization (SVE) and memory-leak hunting 
rather than new operators.

## Good First Issues
- #8960 **Remove Spark-3.2 unit-test leftovers** – grep and delete stale 
suites; no C++/Java changes required.
- #10275 **Document gaps in from_json/to_json support** – update docs when 
functions fall back to Spark; good for doc/QA contributors.
- #9184 **Add S3/HDFS/GCS/ABFS integration tests** – extend existing GHA build 
checks with mini-cluster/docker tests; infra skills helpful.
- #6814 **Implement MakeYMInterval expression for ClickHouse backend** – pure 
C++ registration and unit test; mirrors existing date functions.

GitHub link: https://github.com/apache/incubator-gluten/discussions/11350

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to