james-willis opened a new issue, #2704: URL: https://github.com/apache/sedona/issues/2704
## Expected behavior On Spark 4.1+, the `TransformNestedUDTParquet` optimizer rule should not be registered, since the root cause it works around (SPARK-48942) has been fixed natively by [SPARK-52651](https://issues.apache.org/jira/browse/SPARK-52651). ## Actual behavior The `TransformNestedUDTParquet` rule is unconditionally registered on all Spark versions, including 4.1+ where it is unnecessary. While not a crash bug, it adds an unnecessary optimizer rule that modifies plan output attributes on versions where Spark handles UDTs in the vectorized Parquet reader natively. ## Steps to reproduce the problem 1. Run Sedona on Spark 4.1+ 2. Read a GeoParquet file with nested geometry columns (e.g., array of struct containing GeometryUDT) 3. Observe that `TransformNestedUDTParquet` still transforms the schema even though Spark 4.1 handles it ## Settings Sedona version = 1.8.x / master Apache Spark version = 4.1+ API type = Scala ## Context - PR #2359 introduced `TransformNestedUDTParquet` to work around SPARK-48942, which caused the vectorized Parquet reader to crash on nested UDTs. - [SPARK-52651](https://issues.apache.org/jira/browse/SPARK-52651) (merged in Spark 4.1) fixes this at the Spark level by recursively stripping UDTs in `ColumnVector`. - The workaround should be version-gated to only run on Spark < 4.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
