baibaichen opened a new pull request, #11727:
URL: https://github.com/apache/incubator-gluten/pull/11727

   ## What changes are proposed in this pull request?
   
   Fix struct join key validation and enable `GlutenDataFrameSubquerySuite` for 
Spark 4.1.
   
   Spark 4.1 added `isin(Dataset)` API which creates struct IN subquery 
predicates. These are converted to `BroadcastHashJoin` with struct-typed join 
keys where field names may differ between sides (e.g., `struct(a, b)` vs 
`struct(c, d)`).
   
   **Changes:**
   
   1. **`JoinExecTransformer.scala`** — Remove struct field name comparison in 
`sameType()` to align with Spark's 
`DataType.equalsStructurally(ignoreNullability=true)` semantics. Spark's native 
`HashJoin` only checks structural type compatibility (field count + field types 
by position), not field names. This allows struct IN subqueries to be offloaded 
to Velox natively.
   
   2. **`Validators.scala`** — Add try-catch in 
`FallbackByNativeValidation.offloadAttempt` to catch exceptions during 
transformer construction and return graceful fallback instead of crashing.
   
   3. **`VeloxTestSettings.scala`** — Enable `GlutenDataFrameSubquerySuite` for 
Spark 4.1 (was previously disabled with `// TODO: 4.x`).
   
   ## How was this patch tested?
   
   - `GlutenDataFrameSubquerySuite` on Spark 4.1: all 47 tests passed (struct 
IN subqueries offloaded to Velox natively)
   - `GlutenDataFrameSubquerySuite` on Spark 4.0: all 40 tests passed (no 
regression)
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: GitHub Copilot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to