[PR] Spark 4.1: Read and write geometry and geography values in Parquet [iceberg]

via GitHub Fri, 03 Jul 2026 14:47:26 -0700


huan233usc opened a new pull request, #17073:
URL: https://github.com/apache/iceberg/pull/17073


   Follow-up to the geo type work: the Spark type mapping (#16851) and Iceberg's
   own Parquet value path (#16982) are in place, but the Spark Parquet
   reader/writer did not handle geometry/geography values.
   
   Geometry and geography columns carry a Parquet `LogicalTypeAnnotation` with 
no
   legacy `OriginalType`. `SparkParquetReaders` and `SparkParquetWriters` 
dispatch
   geo through the `OriginalType` / logical-type paths, so:
   - the reader fell through to the physical `BINARY` case and returned a raw
     `byte[]`, which is the wrong in-memory type for a geo column (Spark's
     `InternalRow.getGeometry` / `getGeography` expect `GeometryVal` / 
`GeographyVal`);
   - the writer hit the unsupported-logical-type branch and threw.
   
   This reads a WKB `BINARY` column into Spark's `GeometryVal` / `GeographyVal` 
and
   writes those values back as their WKB bytes, mirroring the existing binary
   handling. Geo values are stored as pure WKB, so no transformation is needed
   beyond wrapping/unwrapping the byte payload.
   
   Testing:
   - Enables the shared geospatial `DataTest` coverage for the Spark Parquet
     reader (`supportsGeospatial()`), exercising geometry and geography read
     round-trips through `SparkParquetReaders`.
   - Adds a Spark writer round-trip test (`TestSparkParquetWriter`) that writes
     `GeometryVal` / `GeographyVal` through `SparkParquetWriters` and reads them
     back, including null values.
   
   Vectorized (Arrow) geo reads are out of scope and remain a follow-up.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark 4.1: Read and write geometry and geography values in Parquet [iceberg]

Reply via email to