Copilot commented on code in PR #2667:
URL: https://github.com/apache/sedona/pull/2667#discussion_r2836031747


##########
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala:
##########
@@ -712,6 +731,20 @@ object GeoParquetWriteSupport {
     // that are present in the column.
     val seenGeometryTypes: mutable.Set[String] = mutable.Set.empty
 
+    // Track SRIDs seen in geometry values. A consistent non-zero SRID can be 
used
+    // to auto-generate CRS (projjson) metadata when no explicit CRS is 
provided.

Review Comment:
   The comment mentions "A consistent non-zero SRID" but the actual 
implementation also handles SRID 0 (it falls back to null CRS). Consider 
clarifying: "Track SRIDs seen in geometry values. A consistent SRID can be used 
to auto-generate CRS (projjson) metadata when no explicit CRS is provided: SRID 
4326 results in omitted CRS (GeoParquet default), positive non-4326 SRIDs 
generate PROJJSON, and SRID 0 or mixed SRIDs result in null CRS."
   ```suggestion
       // Track SRIDs seen in geometry values. A consistent SRID can be used to
       // auto-generate CRS (projjson) metadata when no explicit CRS is 
provided:
       // SRID 4326 results in omitted CRS (GeoParquet default), positive 
non-4326
       // SRIDs generate PROJJSON, and SRID 0 or mixed SRIDs result in null CRS.
   ```



##########
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetMetaData.scala:
##########
@@ -203,6 +204,33 @@ object GeoParquetMetaData {
     }
   }
 
+  /**
+   * Convert an SRID to a PROJJSON JValue using proj4sedona.
+   *
+   * The generated PROJJSON includes an `id` field with the EPSG authority and 
code, which enables
+   * round-trip SRID preservation when reading the GeoParquet file back.
+   *
+   * @param srid
+   *   The SRID to convert (e.g., 4326 for WGS 84).
+   * @return
+   *   Some(JValue) containing the PROJJSON if conversion succeeds, None if 
the SRID is 0 or
+   *   unknown.

Review Comment:
   The documentation comment should explicitly mention that None is returned 
for SRID 4326 (DEFAULT_SRID) since this is the GeoParquet default. The current 
comment only mentions "SRID is 0 or unknown" but doesn't explain why 4326 also 
returns None. Suggested text: "None if the SRID is 0 (unknown), 4326 
(GeoParquet default CRS), or if conversion fails."
   ```suggestion
      *   Some(JValue) containing the PROJJSON if conversion succeeds, None if 
the SRID is 0
      *   (unknown), 4326 (GeoParquet default CRS), or if conversion fails.
   ```



##########
spark/common/src/main/scala/org/apache/spark/sql/execution/datasources/geoparquet/GeoParquetWriteSupport.scala:
##########
@@ -159,8 +160,12 @@ class GeoParquetWriteSupport extends 
WriteSupport[InternalRow] with Logging {
         // If no CRS is specified, we write null to the crs metadata field. 
This is for compatibility with
         // geopandas 0.10.0 and earlier versions, which requires crs field to 
be present.

Review Comment:
   The comment here is now misleading. While it states that "we write null to 
the crs metadata field" for compatibility with geopandas 0.10.0, the actual 
behavior has changed with this PR. When no explicit CRS option is provided, the 
code now derives CRS from SRID (see lines 256-267 in finalizeWrite). The null 
value is only written as a fallback when SRID is 0 or when SRID-to-PROJJSON 
conversion fails. Consider updating this comment to reflect the new behavior, 
e.g., "If no CRS is specified, we default to deriving CRS from geometry SRID. 
This value serves as a fallback for cases where SRID is 0 or conversion fails, 
maintaining compatibility with geopandas 0.10.0 and earlier."
   ```suggestion
           // If no CRS is specified, we default to deriving CRS from the 
geometry SRID in finalizeWrite.
           // This JNull value is used as a fallback when SRID is 0 or 
SRID-to-PROJJSON conversion fails,
           // maintaining compatibility with geopandas 0.10.0 and earlier 
versions, which require a crs field.
   ```



##########
docs/tutorial/files/geoparquet-sedona-spark.md:
##########
@@ -198,14 +198,19 @@ df.write.format("geoparquet")
 
 The value of `geoparquet.crs` and `geoparquet.crs.<column_name>` can be one of 
the following:
 
-* `"null"`: Explicitly setting `crs` field to `null`. This is the default 
behavior.
+* `"null"`: Explicitly setting `crs` field to `null`. This is the default 
behavior when geometry SRID is 0.
 * `""` (empty string): Omit the `crs` field. This implies that the CRS is 
[OGC:CRS84](https://www.opengis.net/def/crs/OGC/1.3/CRS84) for CRS-aware 
implementations.
 * `"{...}"` (PROJJSON string): The `crs` field will be set as the PROJJSON 
object representing the Coordinate Reference System (CRS) of the geometry. You 
can find the PROJJSON string of a specific CRS from here: https://epsg.io/ 
(click the JSON option at the bottom of the page). You can also customize your 
PROJJSON string as needed.
 
-Please note that Sedona currently cannot set/get a projjson string to/from a 
CRS. Its geoparquet reader will ignore the projjson metadata and you will have 
to set your CRS via [`ST_SetSRID`](../../api/sql/Function.md#st_setsrid) after 
reading the file.
-Its geoparquet writer will not leverage the SRID field of a geometry so you 
will have to always set the `geoparquet.crs` option manually when writing the 
file, if you want to write a meaningful CRS field.
+### Automatic CRS from SRID
 
-Due to the same reason, Sedona geoparquet reader and writer do NOT check the 
axis order (lon/lat or lat/lon) and assume they are handled by the users 
themselves when writing / reading the files. You can always use 
[`ST_FlipCoordinates`](../../api/sql/Function.md#st_flipcoordinates) to swap 
the axis order of your geometries.
+When no `geoparquet.crs` option is explicitly provided, Sedona will 
automatically derive the CRS PROJJSON from the SRID of the geometry column. For 
example, if all geometries in a column have SRID 4326 (set via 
[`ST_SetSRID`](../../api/sql/Function.md#st_setsrid)), the writer will 
automatically produce the PROJJSON for EPSG:4326 in the GeoParquet metadata.

Review Comment:
   This statement is misleading. According to the implementation and PR 
description, when geometries have SRID 4326, the CRS field is **omitted** from 
the GeoParquet metadata (since 4326/OGC:CRS84 is the GeoParquet spec default), 
not written as PROJJSON. Consider revising to: "When no geoparquet.crs option 
is explicitly provided, Sedona will automatically derive the CRS PROJJSON from 
the SRID of the geometry column. For example, if all geometries in a column 
have SRID 32632, the writer will automatically produce the PROJJSON for 
EPSG:32632. For SRID 4326, the CRS field is omitted since this is the 
GeoParquet default."
   ```suggestion
   When no `geoparquet.crs` option is explicitly provided, Sedona will 
automatically derive the CRS PROJJSON from the SRID of the geometry column. For 
example, if all geometries in a column have SRID 32632 (set via 
[`ST_SetSRID`](../../api/sql/Function.md#st_setsrid)), the writer will 
automatically produce the PROJJSON for EPSG:32632 in the GeoParquet metadata. 
For SRID 4326, the CRS field is omitted since this is the GeoParquet default 
(OGC:CRS84).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to