baibaichen opened a new pull request, #11805:
URL: https://github.com/apache/gluten/pull/11805
### What changes were proposed in this pull request?
Follow-up to #11799 which fixed PlanStability test suites to properly load
the Gluten plugin and validate plans via `ValidateRequirements`.
This PR adds **Gluten-specific golden file comparison** to detect unintended
plan regressions, similar to Spark's built-in `PlanStabilitySuite` mechanism.
#### Key changes:
1. **Full golden file support in `GlutenPlanStabilityTestTrait`**:
- `getSimplifiedPlan` adapted for Gluten Transformer nodes
(`ColumnarExchange`, `ColumnarBroadcastExchange`, `ColumnarSubqueryBroadcast`)
- `normalizeIds` extended to handle `_pre_N` expression names from
Gluten's pre-projection optimization
- `checkWithApproved` / `generateGoldenFile` for plan comparison and
generation
- `SPARK_GENERATE_GOLDEN_FILES=1` triggers golden file regeneration
2. **Backend-aware golden file paths**: Golden files are stored under
`backends-{backendName}/` (e.g.,
`backends-velox/tpcds-plan-stability/gluten-approved-plans-v1_4/q1/`),
supporting multi-backend (Velox / ClickHouse) out of the box via
`BackendsApiManager.getBackendName`.
3. **Test flow**: `testQuery()` first validates
`ValidateRequirements.validate(plan)`, then compares against golden files (or
generates them).
### How was this patch tested?
All 7 test suites pass on both Spark 4.0 and 4.1 with golden file comparison:
| Suite | Tests | Spark 4.0 | Spark 4.1 |
|-------|-------|-----------|-----------|
| `GlutenTPCDSV1_4_PlanStabilitySuite` | 97 | ✅ | ✅ |
| `GlutenTPCDSV1_4_PlanStabilityWithStatsSuite` | 97 | ✅ | ✅ |
| `GlutenTPCDSV2_7_PlanStabilitySuite` | 32 | ✅ | ✅ |
| `GlutenTPCDSV2_7_PlanStabilityWithStatsSuite` | 32 | ✅ | ✅ |
| `GlutenTPCDSModifiedPlanStabilitySuite` | 21 | ✅ | ✅ |
| `GlutenTPCDSModifiedPlanStabilityWithStatsSuite` | 21 | ✅ | ✅ |
| `GlutenTPCHPlanStabilitySuite` | 22 | ✅ | ✅ |
Golden files generated with `SPARK_GENERATE_GOLDEN_FILES=1`, then verified
without it — all plans match.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]