baibaichen opened a new pull request, #11727: URL: https://github.com/apache/incubator-gluten/pull/11727
## What changes are proposed in this pull request? Fix struct join key validation and enable `GlutenDataFrameSubquerySuite` for Spark 4.1. Spark 4.1 added `isin(Dataset)` API which creates struct IN subquery predicates. These are converted to `BroadcastHashJoin` with struct-typed join keys where field names may differ between sides (e.g., `struct(a, b)` vs `struct(c, d)`). **Changes:** 1. **`JoinExecTransformer.scala`** — Remove struct field name comparison in `sameType()` to align with Spark's `DataType.equalsStructurally(ignoreNullability=true)` semantics. Spark's native `HashJoin` only checks structural type compatibility (field count + field types by position), not field names. This allows struct IN subqueries to be offloaded to Velox natively. 2. **`Validators.scala`** — Add try-catch in `FallbackByNativeValidation.offloadAttempt` to catch exceptions during transformer construction and return graceful fallback instead of crashing. 3. **`VeloxTestSettings.scala`** — Enable `GlutenDataFrameSubquerySuite` for Spark 4.1 (was previously disabled with `// TODO: 4.x`). ## How was this patch tested? - `GlutenDataFrameSubquerySuite` on Spark 4.1: all 47 tests passed (struct IN subqueries offloaded to Velox natively) - `GlutenDataFrameSubquerySuite` on Spark 4.0: all 40 tests passed (no regression) ## Was this patch authored or co-authored using generative AI tooling? Generated-by: GitHub Copilot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
