(sedona) branch master updated: [GH-2664] GeoParquet writer utilizes geometry SRID to produce projjson CRS metadata (#2667)

jiayu Sat, 21 Feb 2026 02:28:07 -0800

This is an automated email from the ASF dual-hosted git repository.

jiayu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/sedona.git



The following commit(s) were added to refs/heads/master by this push:
     new b288870a96 [GH-2664] GeoParquet writer utilizes geometry SRID to 
produce projjson CRS metadata (#2667)
b288870a96 is described below

commit b288870a96c84cb8e94d627cfb5e29791c4ce611
Author: Jia Yu <[email protected]>
AuthorDate: Sat Feb 21 03:27:04 2026 -0700

    [GH-2664] GeoParquet writer utilizes geometry SRID to produce projjson CRS 
metadata (#2667)
---
 docs/tutorial/files/geoparquet-sedona-spark.md     |  13 +-
 pom.xml                                            |   2 +-
 .../geoparquet/GeoParquetMetaData.scala            |  28 +++++
 .../geoparquet/GeoParquetWriteSupport.scala        |  56 ++++++++-
 .../org/apache/sedona/sql/geoparquetIOTests.scala  | 136 +++++++++++++++++++++
 5 files changed, 225 insertions(+), 10 deletions(-)

diff --git a/docs/tutorial/files/geoparquet-sedona-spark.md 
b/docs/tutorial/files/geoparquet-sedona-spark.md
index 833437f577..643d65e467 100644
--- a/docs/tutorial/files/geoparquet-sedona-spark.md
+++ b/docs/tutorial/files/geoparquet-sedona-spark.md
@@ -198,14 +198,19 @@ df.write.format("geoparquet")
 
 The value of `geoparquet.crs` and `geoparquet.crs.<column_name>` can be one of 
the following:
 
-* `"null"`: Explicitly setting `crs` field to `null`. This is the default 
behavior.
+* `"null"`: Explicitly setting `crs` field to `null`. This is the default 
behavior when geometry SRID is 0.
 * `""` (empty string): Omit the `crs` field. This implies that the CRS is 
[OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware 
implementations.
 * `"{...}"` (PROJJSON string): The `crs` field will be set as the PROJJSON 
object representing the Coordinate Reference System (CRS) of the geometry. You 
can find the PROJJSON string of a specific CRS from here: https://epsg.io/ 
(click the JSON option at the bottom of the page). You can also customize your 
PROJJSON string as needed.
 
-Please note that Sedona currently cannot set/get a projjson string to/from a 
CRS. Its geoparquet reader will ignore the projjson metadata and you will have 
to set your CRS via [`ST_SetSRID`](../../api/sql/Function.md#st_setsrid) after 
reading the file.
-Its geoparquet writer will not leverage the SRID field of a geometry so you 
will have to always set the `geoparquet.crs` option manually when writing the 
file, if you want to write a meaningful CRS field.
+### Automatic CRS from SRID
 
-Due to the same reason, Sedona geoparquet reader and writer do NOT check the 
axis order (lon/lat or lat/lon) and assume they are handled by the users 
themselves when writing / reading the files. You can always use 
[`ST_FlipCoordinates`](../../api/sql/Function.md#st_flipcoordinates) to swap 
the axis order of your geometries.
+When no `geoparquet.crs` option is explicitly provided, Sedona will 
automatically derive the CRS PROJJSON from the SRID of the geometry column. For 
example, if all geometries in a column have SRID 32632 (set via 
[`ST_SetSRID`](../../api/sql/Function.md#st_setsrid)), the writer will 
automatically produce the PROJJSON for EPSG:32632 in the GeoParquet metadata. 
For SRID 4326, the CRS field is omitted since this is the GeoParquet default 
(OGC:CRS84).
+
+* If the SRID is 0 (the default for geometries without an explicit SRID), the 
`crs` field will be set to `null`.
+* If geometries in a column have mixed SRIDs, the `crs` field defaults to 
`null`.
+* If an explicit `geoparquet.crs` or `geoparquet.crs.<column_name>` option is 
provided, it always takes precedence over the SRID-derived CRS.
+
+Sedona geoparquet reader and writer do NOT check the axis order (lon/lat or 
lat/lon) and assume they are handled by the users themselves when writing / 
reading the files. You can always use 
[`ST_FlipCoordinates`](../../api/sql/Function.md#st_flipcoordinates) to swap 
the axis order of your geometries.
 
 ## Save GeoParquet with Covering Metadata
 
diff --git a/pom.xml b/pom.xml
index 4e8ec472e8..6a113fabd1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -96,7 +96,7 @@
         
<scala-collection-compat.version>2.5.0</scala-collection-compat.version>
         <geoglib.version>1.52</geoglib.version>
         <caffeine.version>2.9.2</caffeine.version>
-        <proj4sedona.version>0.0.5</proj4sedona.version>
+        <proj4sedona.version>0.0.6</proj4sedona.version>
 
         <geotools.scope>provided</geotools.scope>
         <!-- Because it's not in Maven central, make it provided by default -->
diff --git 
a/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala
 
b/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala
index a108e3bafa..053adf6fc7 100644
--- 
a/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala
+++ 
b/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala
@@ -22,6 +22,7 @@ import scala.util.control.NonFatal
 
 import org.apache.spark.sql.types.{DoubleType, FloatType, StructType}
 import org.datasyslab.proj4sedona.core.Proj
+import org.datasyslab.proj4sedona.parser.CRSSerializer
 import org.json4s.jackson.JsonMethods.parse
 import org.json4s.jackson.compactJson
 import org.json4s.{DefaultFormats, Extraction, JField, JNothing, JNull, 
JObject, JValue}
@@ -203,6 +204,33 @@ object GeoParquetMetaData {
     }
   }
 
+  /**
+   * Convert an SRID to a PROJJSON JValue using proj4sedona.
+   *
+   * The generated PROJJSON includes an `id` field with the EPSG authority and 
code, which enables
+   * round-trip SRID preservation when reading the GeoParquet file back.
+   *
+   * @param srid
+   *   The SRID to convert (e.g., 4326 for WGS 84).
+   * @return
+   *   Some(JValue) containing the PROJJSON if conversion succeeds, None if 
the SRID is 0
+   *   (unknown), 4326 (GeoParquet default CRS), or if conversion fails.
+   */
+  def sridToProjJson(srid: Int): Option[JValue] = {
+    if (srid == 0 || srid == DEFAULT_SRID) return None
+    try {
+      val proj = new Proj("EPSG:" + srid)
+      val projjsonStr = CRSSerializer.toProjJson(proj)
+      if (projjsonStr != null && projjsonStr.nonEmpty) {
+        Some(parse(projjsonStr))
+      } else {
+        None
+      }
+    } catch {
+      case NonFatal(_) => None
+    }
+  }
+
   def createCoveringColumnMetadata(coveringColumnName: String, schema: 
StructType): Covering = {
     val coveringColumnIndex = schema.fieldIndex(coveringColumnName)
     schema(coveringColumnIndex).dataType match {
diff --git 
a/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala
 
b/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala
index 48655e5977..ca6f7e090e 100644
--- 
a/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala
+++ 
b/spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala
@@ -108,6 +108,7 @@ class GeoParquetWriteSupport extends 
WriteSupport[InternalRow] with Logging {
 
   private var geoParquetVersion: Option[String] = None
   private var defaultGeoParquetCrs: Option[JValue] = None
+  private var userExplicitlySetDefaultCrs: Boolean = false
   private val geoParquetColumnCrsMap: mutable.Map[String, Option[JValue]] = 
mutable.Map.empty
   private val geoParquetColumnCoveringMap: mutable.Map[String, Covering] = 
mutable.Map.empty
   private val generatedCoveringColumnOrdinals: mutable.Map[Int, Int] = 
mutable.Map.empty
@@ -156,11 +157,16 @@ class GeoParquetWriteSupport extends 
WriteSupport[InternalRow] with Logging {
     }
     defaultGeoParquetCrs = configuration.get(GEOPARQUET_CRS_KEY) match {
       case null =>
-        // If no CRS is specified, we write null to the crs metadata field. 
This is for compatibility with
-        // geopandas 0.10.0 and earlier versions, which requires crs field to 
be present.
+        // If no CRS is specified, we default to deriving CRS from the 
geometry SRID in finalizeWrite.
+        // This JNull value is used as a fallback when SRID is 0 or 
SRID-to-PROJJSON conversion fails,
+        // maintaining compatibility with geopandas 0.10.0 and earlier 
versions, which require a crs field.
         Some(org.json4s.JNull)
-      case "" => None
-      case crs: String => Some(parse(crs))
+      case "" =>
+        userExplicitlySetDefaultCrs = true
+        None
+      case crs: String =>
+        userExplicitlySetDefaultCrs = true
+        Some(parse(crs))
     }
     geometryColumnInfoMap.keys.map(schema(_).name).foreach { name =>
       Option(configuration.get(GEOPARQUET_CRS_KEY + "." + name)).foreach {
@@ -246,7 +252,21 @@ class GeoParquetWriteSupport extends 
WriteSupport[InternalRow] with Logging {
             columnInfo.bbox.maxX,
             columnInfo.bbox.maxY)
         } else Seq(0.0, 0.0, 0.0, 0.0)
-        val crs = geoParquetColumnCrsMap.getOrElse(columnName, 
defaultGeoParquetCrs)
+        val crs = geoParquetColumnCrsMap.getOrElse(
+          columnName, {
+            if (!userExplicitlySetDefaultCrs) {
+              // No explicit CRS option was provided; try to derive from 
geometry SRID.
+              // For SRID 4326 (OGC:CRS84), omit CRS entirely per GeoParquet 
spec default.
+              columnInfo.observedSrid match {
+                case Some(srid) if srid == GeoParquetMetaData.DEFAULT_SRID => 
None
+                case Some(srid) if srid > 0 =>
+                  
GeoParquetMetaData.sridToProjJson(srid).orElse(defaultGeoParquetCrs)
+                case _ => defaultGeoParquetCrs
+              }
+            } else {
+              defaultGeoParquetCrs
+            }
+          })
         val covering = geoParquetColumnCoveringMap.get(columnName)
         columnName -> GeometryFieldMetaData("WKB", geometryTypes, bbox, crs, 
covering)
       }.toMap
@@ -712,6 +732,22 @@ object GeoParquetWriteSupport {
     // that are present in the column.
     val seenGeometryTypes: mutable.Set[String] = mutable.Set.empty
 
+    // Track SRIDs seen in geometry values. A consistent SRID can be used to
+    // auto-generate CRS (projjson) metadata when no explicit CRS is provided:
+    // SRID 4326 results in omitted CRS (GeoParquet default), positive non-4326
+    // SRIDs generate PROJJSON, and SRID 0 or mixed SRIDs result in null CRS.
+    private var _srid: Int = -1 // -1 = no geometries seen yet
+    private var _mixedSrids: Boolean = false
+
+    /**
+     * Returns the observed SRID if all geometries had the same SRID, or None 
if no geometries
+     * were seen or if mixed SRIDs were encountered.
+     */
+    def observedSrid: Option[Int] = {
+      if (_mixedSrids || _srid == -1) None
+      else Some(_srid)
+    }
+
     def update(geom: Geometry): Unit = {
       bbox.update(geom)
       // In case of 3D geometries, a " Z" suffix gets added (e.g. ["Point Z"]).
@@ -721,6 +757,16 @@ object GeoParquetWriteSupport {
       }
       val geometryType = if (!hasZ) geom.getGeometryType else 
geom.getGeometryType + " Z"
       seenGeometryTypes.add(geometryType)
+
+      // Track SRID consistency across all geometries in this column
+      if (!_mixedSrids) {
+        val geomSrid = geom.getSRID
+        if (_srid == -1) {
+          _srid = geomSrid
+        } else if (_srid != geomSrid) {
+          _mixedSrids = true
+        }
+      }
     }
   }
 
diff --git 
a/spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala 
b/spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala
index 9f3bc97d27..3041757ed2 100644
--- a/spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala
+++ b/spark/common/src/test/scala/org/apache/sedona/sql/geoparquetIOTests.scala
@@ -572,6 +572,142 @@ class geoparquetIOTests extends TestBaseScala with 
BeforeAndAfterAll {
       }
     }
 
+    it("GeoParquet save should omit CRS for SRID 4326 per GeoParquet default") 
{
+      val wktReader = new WKTReader()
+      val geom = wktReader.read("POINT (1 2)")
+      geom.setSRID(4326)
+      val testData = Seq(Row(1, geom))
+      val schema = StructType(
+        Seq(
+          StructField("id", IntegerType, nullable = false),
+          StructField("geometry", GeometryUDT(), nullable = false)))
+      val df = sparkSession.createDataFrame(testData.asJava, 
schema).repartition(1)
+      val geoParquetSavePath = geoparquetoutputlocation + 
"/gp_srid_4326_omit_crs.parquet"
+      df.write.format("geoparquet").mode("overwrite").save(geoParquetSavePath)
+      validateGeoParquetMetadata(geoParquetSavePath) { geo =>
+        val crs = geo \ "columns" \ "geometry" \ "crs"
+        // SRID 4326 = OGC:CRS84, the GeoParquet default. CRS field should be 
omitted.
+        assert(
+          crs == org.json4s.JNothing,
+          s"Expected omitted CRS for SRID 4326 (GeoParquet default), got $crs")
+      }
+      // Round-trip: read back and verify SRID is preserved (omitted CRS -> 
4326)
+      val df2 = sparkSession.read.format("geoparquet").load(geoParquetSavePath)
+      val geoms = df2.select("geometry").collect().map(_.getAs[Geometry](0))
+      geoms.foreach { g =>
+        assert(g.getSRID == 4326, s"Expected SRID 4326 after round-trip, got 
${g.getSRID}")
+      }
+    }
+
+    it("GeoParquet save should auto-generate projjson from non-default SRID") {
+      val wktReader = new WKTReader()
+      val geom = wktReader.read("POINT (500000 4649776)")
+      geom.setSRID(32632)
+      val testData = Seq(Row(1, geom))
+      val schema = StructType(
+        Seq(
+          StructField("id", IntegerType, nullable = false),
+          StructField("geometry", GeometryUDT(), nullable = false)))
+      val df = sparkSession.createDataFrame(testData.asJava, 
schema).repartition(1)
+      val geoParquetSavePath = geoparquetoutputlocation + 
"/gp_auto_crs_from_srid_32632.parquet"
+      df.write.format("geoparquet").mode("overwrite").save(geoParquetSavePath)
+      validateGeoParquetMetadata(geoParquetSavePath) { geo =>
+        implicit val formats: org.json4s.Formats = org.json4s.DefaultFormats
+        val crs = geo \ "columns" \ "geometry" \ "crs"
+        // CRS should be auto-generated from SRID 32632
+        assert(
+          crs.isInstanceOf[org.json4s.JObject],
+          s"Expected JObject for auto-generated CRS, got $crs")
+        val authority = (crs \ "id" \ "authority").extract[String]
+        val code = (crs \ "id" \ "code").extract[Int]
+        assert(authority == "EPSG")
+        assert(code == 32632)
+      }
+      // Round-trip: read back and verify SRID is preserved
+      val df2 = sparkSession.read.format("geoparquet").load(geoParquetSavePath)
+      val geoms = df2.select("geometry").collect().map(_.getAs[Geometry](0))
+      geoms.foreach { g =>
+        assert(g.getSRID == 32632, s"Expected SRID 32632 after round-trip, got 
${g.getSRID}")
+      }
+    }
+
+    it("GeoParquet save should keep crs null when geometry SRID is 0") {
+      val wktReader = new WKTReader()
+      val geom = wktReader.read("POINT (1 2)")
+      // SRID defaults to 0
+      assert(geom.getSRID == 0)
+      val testData = Seq(Row(1, geom))
+      val schema = StructType(
+        Seq(
+          StructField("id", IntegerType, nullable = false),
+          StructField("geometry", GeometryUDT(), nullable = false)))
+      val df = sparkSession.createDataFrame(testData.asJava, 
schema).repartition(1)
+      val geoParquetSavePath = geoparquetoutputlocation + 
"/gp_srid_zero_crs_null.parquet"
+      df.write.format("geoparquet").mode("overwrite").save(geoParquetSavePath)
+      validateGeoParquetMetadata(geoParquetSavePath) { geo =>
+        val crs = geo \ "columns" \ "geometry" \ "crs"
+        assert(crs == org.json4s.JNull, s"Expected null CRS for SRID 0, got 
$crs")
+      }
+    }
+
+    it("GeoParquet save should use explicit CRS option over SRID-derived CRS") 
{
+      val wktReader = new WKTReader()
+      val geom = wktReader.read("POINT (1 2)")
+      geom.setSRID(4326)
+      val testData = Seq(Row(1, geom))
+      val schema = StructType(
+        Seq(
+          StructField("id", IntegerType, nullable = false),
+          StructField("geometry", GeometryUDT(), nullable = false)))
+      val df = sparkSession.createDataFrame(testData.asJava, 
schema).repartition(1)
+      val geoParquetSavePath =
+        geoparquetoutputlocation + "/gp_explicit_crs_overrides_srid.parquet"
+
+      // Explicitly set CRS to null — should override SRID-derived CRS
+      df.write
+        .format("geoparquet")
+        .option("geoparquet.crs", "null")
+        .mode("overwrite")
+        .save(geoParquetSavePath)
+      validateGeoParquetMetadata(geoParquetSavePath) { geo =>
+        val crs = geo \ "columns" \ "geometry" \ "crs"
+        assert(crs == org.json4s.JNull, s"Expected null CRS when explicitly 
set, got $crs")
+      }
+
+      // Explicitly omit CRS — should override SRID-derived CRS
+      df.write
+        .format("geoparquet")
+        .option("geoparquet.crs", "")
+        .mode("overwrite")
+        .save(geoParquetSavePath)
+      validateGeoParquetMetadata(geoParquetSavePath) { geo =>
+        val crs = geo \ "columns" \ "geometry" \ "crs"
+        assert(
+          crs == org.json4s.JNothing,
+          s"Expected omitted CRS when explicitly set to empty, got $crs")
+      }
+    }
+
+    it("GeoParquet save should keep crs null for mixed SRIDs in one column") {
+      val wktReader = new WKTReader()
+      val geom1 = wktReader.read("POINT (1 2)")
+      geom1.setSRID(4326)
+      val geom2 = wktReader.read("POINT (3 4)")
+      geom2.setSRID(32632)
+      val testData = Seq(Row(1, geom1), Row(2, geom2))
+      val schema = StructType(
+        Seq(
+          StructField("id", IntegerType, nullable = false),
+          StructField("geometry", GeometryUDT(), nullable = false)))
+      val df = sparkSession.createDataFrame(testData.asJava, 
schema).repartition(1)
+      val geoParquetSavePath = geoparquetoutputlocation + 
"/gp_mixed_srid.parquet"
+      df.write.format("geoparquet").mode("overwrite").save(geoParquetSavePath)
+      validateGeoParquetMetadata(geoParquetSavePath) { geo =>
+        val crs = geo \ "columns" \ "geometry" \ "crs"
+        assert(crs == org.json4s.JNull, s"Expected null CRS for mixed SRIDs, 
got $crs")
+      }
+    }
+
     it("GeoParquet read should set SRID from PROJJSON CRS with EPSG 
identifier") {
       val df = 
sparkSession.read.format("geoparquet").load(geoparquetdatalocation4)
       val projjson =

(sedona) branch master updated: [GH-2664] GeoParquet writer utilizes geometry SRID to produce projjson CRS metadata (#2667)

Reply via email to