[PR] [UT][VL] Refresh TPC-H q19 plan stability golden file [gluten]

via GitHub Thu, 25 Jun 2026 20:52:27 -0700


brijrajk opened a new pull request, #12374:
URL: https://github.com/apache/gluten/pull/12374

## Problem

`GlutenTPCHPlanStabilitySuite` → `tpch/q19` has been failing in
`spark-test-spark40` CI runs.

### Root cause

`GlutenPlanStabilitySuite.glutenNormalizeIds()` uses the regex
`(?<prefix>(?<!id=)#)\\d+L?` to normalize ExprIds in explain plans. This regex
matches **any** `#<number>` occurrence — including TPC-H string literals. The
`p_brand` filter in q19 uses values `Brand#11`, `Brand#12`, `Brand#13` (actual
data values from the TPC-H spec). These appear unquoted in the explain text:

```
EqualTo(p_brand, Brand#12)
```

The normalizer incorrectly treats `#12` here as an ExprId and remaps it
sequentially. The result depends on how many unique `#N` patterns were seen
before this point in the plan — which changes whenever new optimizer rules or
expressions are added to the codebase.

> Note: The suite code itself warns about this at line 67–68:
> *"Running all suites together in one JVM is recommended to avoid ExprId
normalization issues where string constants (e.g., Brand#23 in TPCH q19) may
collide with ExprId numbers."*

### What changed

The golden file was committed in #11805 (2026-03-24). Since then **264
commits** landed on `main`, adding new rules and expressions that shifted the
ExprId counter. Now `Brand#12` normalizes to `Brand#6` and internal IDs like
`_pre_1#14` shift to `_pre_1#13`.

**Exact diff (original vs current):**
```
- EqualTo(p_brand, Brand#12) ... Brand#13
+ EqualTo(p_brand, Brand#6) ... Brand#12

- _pre_1#14 / sum#15 / isEmpty#16
+ _pre_1#13 / sum#14 / isEmpty#15
```

### Evidence that this is pre-existing (not introduced by any recent PR)

Ran `GlutenTPCHPlanStabilitySuite` on `main` (without any pending PRs
applied):

```
Tests: succeeded 21, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED *** ← tpch/q19
```

Then regenerated with `SPARK_GENERATE_GOLDEN_FILES=1` and re-ran:

```
Tests: succeeded 22, failed 0, canceled 0, ignored 0, pending 0
BUILD SUCCESS
```

Only `q19/explain.txt` changed. `simplified.txt` and all other queries
(q1–q18, q20–q22) are unaffected — the plan structure is correct, only the
ExprId numbering in the explain output shifted.

## Fix

Regenerated `q19/explain.txt` by running `GlutenTPCHPlanStabilitySuite` with
`SPARK_GENERATE_GOLDEN_FILES=1 SPARK_ANSI_SQL_MODE=false`.

A proper long-term fix would be to make `glutenNormalizeIds` skip `#N`
occurrences inside string literal contexts, but that is a separate
infrastructure change.

## Impact

- Only
`gluten-ut/spark40/src/test/resources/backends-velox/gluten-tpch-plan-stability/q19/explain.txt`
changes
- No production code changes
- No other test queries affected

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [UT][VL] Refresh TPC-H q19 plan stability golden file [gluten]

Reply via email to