zhangfengcdt commented on code in PR #2359:
URL: https://github.com/apache/sedona/pull/2359#discussion_r2368973423


##########
spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala:
##########
@@ -784,6 +784,68 @@ class geoparquetIOTests extends TestBaseScala with 
BeforeAndAfterAll {
     }
   }
 
+  describe("Fix SPARK-48942 reading parquet with array of structs of UDTs 
workaround") {
+    it("should handle array of struct with geometry UDT") {
+      // This test reproduces the issue described in SPARK-48942
+      // https://issues.apache.org/jira/browse/SPARK-48942
+      // where reading back nested geometry from Parquet with PySpark 3.5 fails
+      val testPath = geoparquetoutputlocation + "/spark_48942_test.parquet"
+
+      // Create DataFrame with array of struct containing geometry
+      val df = sparkSession.sql("""
+        SELECT ARRAY(STRUCT(ST_POINT(1.0, 1.1) AS geometry)) AS 
nested_geom_array
+      """)
+
+      // Write to Parquet
+      df.write.mode("overwrite").format("parquet").save(testPath)
+
+      // The fix allows vectorized reading to handle UDT compatibility properly
+      val readDf = sparkSession.read.format("parquet").load(testPath)
+
+      // Verify the geometry data is correct
+      val result = readDf.collect()
+      assert(result.length == 1)
+      val nestedArray = result(0).getSeq[Any](0)
+      assert(nestedArray.length == 1)

Review Comment:
   The type read back is actually geometry.   The parquet metadata stores 
GeometryUDT information in the Spark schema metadata, and when it is read back 
Spark automatically reads this back from the SPARK_METADATA_KEY.  
   
   The TransformNestedUDTParquet rule just fixes / ensures that nested 
GeometryUDT gets properly handled regardless of which metadata source is used.
   
   I have added some tests after read back to test if the regular geometry 
operations work. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to