baibaichen opened a new pull request, #11723:
URL: https://github.com/apache/incubator-gluten/pull/11723

   ## What changes are proposed in this pull request?
   
   Enable `GlutenParquetVariantShreddingSuite` for Spark 4.1 by adding fallback 
logic for Parquet variant logical type annotations.
   
   Spark 4.1 introduced Parquet variant logical type annotations 
(`PARQUET_ANNOTATE_VARIANT_LOGICAL_TYPE`, `PARQUET_IGNORE_VARIANT_ANNOTATION`). 
When reading a variant-annotated Parquet file with a non-variant schema, 
Spark's `ParquetSchemaConverter` validates the annotation and throws an error. 
Velox native reader does not check variant annotations, so the scan must fall 
back to vanilla Spark for correct behavior.
   
   Changes:
   - Add `shouldFallbackForParquetVariantAnnotation` shim method in 
`SparkShims` (default: `false`)
   - Implement variant annotation detection in `Spark41Shims`: checks 
`PARQUET_IGNORE_VARIANT_ANNOTATION` config and recursively scans Parquet schema 
for `VariantLogicalTypeAnnotation`
   - Add `validateVariantAnnotation` in `ParquetMetadataUtils` (not gated by 
`parquetMetadataValidationEnabled` since this is a correctness issue)
   - Call `validateVariantAnnotation` from `VeloxBackend.validateMetadata()`
   - Enable `GlutenParquetVariantShreddingSuite` in spark41 `VeloxTestSettings`
   
   ## How was this patch tested?
   
   - spark41 `GlutenParquetVariantShreddingSuite`: **7/7 passed** (including 
the previously failing `"variant logical type annotation - ignore variant 
annotation"`)
   - spark40 `GlutenParquetVariantShreddingSuite`: **5/5 passed** (no 
regression)
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: GitHub Copilot (Claude)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to