GitHub user GlutenPerfBot created a discussion: January 02, 2026: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The Gluten community closed 7 PRs and advanced 19 open ones, with heavy focus on Spark 4.1 readiness, Velox daily bumps, and CI/build hygiene. Spark-3.2 deprecation and ARM/neoverse-v2 build fixes are now tracked issues. ## Key Ongoing Projects - **Spark 4.1 support** – @baibaichen opened the shim layer (#11347) and compatibility mega-fix (#11313, now merged) covering geospatial types, commons-collections 4.x, and test-environment alignment. - **Velox version cadence** – daily bumps by @jinchengchenghh (#11348, #11337, #11349) keep Gluten pinned to latest IBM/velox commits; all merged within hours. - **New backend “Bolt”** – @WangGuangxin posted an early 260 k-line draft (#11261) introducing a fourth execution engine. - **SVE performance** – @chiranmoyh’s draft (#11045) shows 2× speed-up for SparkFloorFunction on Graviton3. ## Priority Items - **Memory leak in ColumnarPartialProject** (#11336) – reported by @liujiayi771; blocks deployments using the popular partial-project flag. - **HashJoin hang on TPC-DS 10 TB** (#11335) – @xiaojie19852006 sees 3× slowdown vs Vanilla Spark; needs profiler attention. - **CSV/Parquet test failures under Spark 4.0** – umbrella issue #11088 still has ~10 suites unclaimed; help wanted. - **ARM build failure on CentOS 8** (#9858) – GCC-11 lacks neoverse-v2 flag; awaits volunteer to patch Velox detection script. ## Notable Discussions - **PME (Parquet Modular Encryption) support** (#11338) – user @Dormant7 asks whether encrypted Parquet files can be read; no answer yet; security-savvy contributors welcome. ## Emerging Trends 1. Spark-version velocity: 4.0 tests nearly complete, 4.1 shim landed within days of upstream release. 2. Daily Velox pin-and-bump process is now zero-touch via @GlutenPerfBot and @jinchengchenghh. 3. Build/CI hygiene trending: clang-tidy (#11120), IWYU (#11287), Maven wrapper adoption (#11332) to cut environment drift. 4. Performance work shifting to vectorization (SVE) and memory-leak hunting rather than new operators. ## Good First Issues - #8960 **Remove Spark-3.2 unit-test leftovers** – grep and delete stale suites; no C++/Java changes required. - #10275 **Document gaps in from_json/to_json support** – update docs when functions fall back to Spark; good for doc/QA contributors. - #9184 **Add S3/HDFS/GCS/ABFS integration tests** – extend existing GHA build checks with mini-cluster/docker tests; infra skills helpful. - #6814 **Implement MakeYMInterval expression for ClickHouse backend** – pure C++ registration and unit test; mirrors existing date functions. GitHub link: https://github.com/apache/incubator-gluten/discussions/11350 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
