Kontinuation commented on code in PR #1162:
URL: https://github.com/apache/sedona/pull/1162#discussion_r1436879122


##########
spark/spark-3.5/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/GeoParquetWriteSupport.scala:
##########
@@ -172,9 +186,10 @@ class GeoParquetWriteSupport extends 
WriteSupport[InternalRow] with Logging {
         val bbox = if (geometryTypes.nonEmpty) {
           Seq(columnInfo.bbox.minX, columnInfo.bbox.minY, 
columnInfo.bbox.maxX, columnInfo.bbox.maxY)
         } else Seq(0.0, 0.0, 0.0, 0.0)
-        columnName -> GeometryFieldMetaData("WKB", geometryTypes, bbox)
+        val crs = 
geoParquetColumnCrsMap.get(columnName).orElse(defaultGeoParquetCrs)

Review Comment:
   Currently the `crs` field is always present (written as `null`). This is 
because early version of geopandas (for example, 0.10.2 used by python tests) 
cannot read geoparquet files without `crs` metadata:
   
   ```python
   >>> geopandas.read_parquet('gp_sample2.parquet')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/opt/homebrew/lib/python3.11/site-packages/geopandas/io/arrow.py", 
line 461, in _read_parquet
       return _arrow_to_geopandas(table)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/opt/homebrew/lib/python3.11/site-packages/geopandas/io/arrow.py", 
line 318, in _arrow_to_geopandas
       _validate_metadata(metadata)
     File "/opt/homebrew/lib/python3.11/site-packages/geopandas/io/arrow.py", 
line 162, in _validate_metadata
       raise ValueError(
   ValueError: 'geo' metadata in Parquet/Feather file is missing required key 
'crs' for column 'geometry'
   ```
   
   `crs` field is optional in recent versions of GeoParquet standard, and 
setting it as `null` has a different meaning from omitting it. We can omit 
`crs` by default. This requires us to upgrade geopandas to 0.13.2, and drop 
support for Python 3.7 since geopandas dropped support for Python 3.7 since 
0.11.
   
   The value for `geoparquet.crs` or `geoparquet.crs.<column_name>` option user 
specified can be one of the following values:
   * `""` (empty string): omit the `crs` metadata
   * `"null"`: explicitly setting `crs` as `null`
   * `"{...PROJJSON...}"`: explicitly setting `crs` to specified PROJJSON object
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to