brijrajk opened a new pull request, #12374: URL: https://github.com/apache/gluten/pull/12374
## Problem `GlutenTPCHPlanStabilitySuite` → `tpch/q19` has been failing in `spark-test-spark40` CI runs. ### Root cause `GlutenPlanStabilitySuite.glutenNormalizeIds()` uses the regex `(?<prefix>(?<!id=)#)\\d+L?` to normalize ExprIds in explain plans. This regex matches **any** `#<number>` occurrence — including TPC-H string literals. The `p_brand` filter in q19 uses values `Brand#11`, `Brand#12`, `Brand#13` (actual data values from the TPC-H spec). These appear unquoted in the explain text: ``` EqualTo(p_brand, Brand#12) ``` The normalizer incorrectly treats `#12` here as an ExprId and remaps it sequentially. The result depends on how many unique `#N` patterns were seen before this point in the plan — which changes whenever new optimizer rules or expressions are added to the codebase. > Note: The suite code itself warns about this at line 67–68: > *"Running all suites together in one JVM is recommended to avoid ExprId normalization issues where string constants (e.g., Brand#23 in TPCH q19) may collide with ExprId numbers."* ### What changed The golden file was committed in #11805 (2026-03-24). Since then **264 commits** landed on `main`, adding new rules and expressions that shifted the ExprId counter. Now `Brand#12` normalizes to `Brand#6` and internal IDs like `_pre_1#14` shift to `_pre_1#13`. **Exact diff (original vs current):** ``` - EqualTo(p_brand, Brand#12) ... Brand#13 + EqualTo(p_brand, Brand#6) ... Brand#12 - _pre_1#14 / sum#15 / isEmpty#16 + _pre_1#13 / sum#14 / isEmpty#15 ``` ### Evidence that this is pre-existing (not introduced by any recent PR) Ran `GlutenTPCHPlanStabilitySuite` on `main` (without any pending PRs applied): ``` Tests: succeeded 21, failed 1, canceled 0, ignored 0, pending 0 *** 1 TEST FAILED *** ← tpch/q19 ``` Then regenerated with `SPARK_GENERATE_GOLDEN_FILES=1` and re-ran: ``` Tests: succeeded 22, failed 0, canceled 0, ignored 0, pending 0 BUILD SUCCESS ``` Only `q19/explain.txt` changed. `simplified.txt` and all other queries (q1–q18, q20–q22) are unaffected — the plan structure is correct, only the ExprId numbering in the explain output shifted. ## Fix Regenerated `q19/explain.txt` by running `GlutenTPCHPlanStabilitySuite` with `SPARK_GENERATE_GOLDEN_FILES=1 SPARK_ANSI_SQL_MODE=false`. A proper long-term fix would be to make `glutenNormalizeIds` skip `#N` occurrences inside string literal contexts, but that is a separate infrastructure change. ## Impact - Only `gluten-ut/spark40/src/test/resources/backends-velox/gluten-tpch-plan-stability/q19/explain.txt` changes - No production code changes - No other test queries affected -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
