Kontinuation commented on code in PR #1162:
URL: https://github.com/apache/sedona/pull/1162#discussion_r1436879122
##########
spark/spark-3.5/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/GeoParquetWriteSupport.scala:
##########
@@ -172,9 +186,10 @@ class GeoParquetWriteSupport extends
WriteSupport[InternalRow] with Logging {
val bbox = if (geometryTypes.nonEmpty) {
Seq(columnInfo.bbox.minX, columnInfo.bbox.minY,
columnInfo.bbox.maxX, columnInfo.bbox.maxY)
} else Seq(0.0, 0.0, 0.0, 0.0)
- columnName -> GeometryFieldMetaData("WKB", geometryTypes, bbox)
+ val crs =
geoParquetColumnCrsMap.get(columnName).orElse(defaultGeoParquetCrs)
Review Comment:
Currently the `crs` field is always present (written as `null`). This is
because early version of geopandas (for example, 0.10.2 used by python tests)
cannot read geoparquet files without `crs` metadata:
```python
>>> geopandas.read_parquet('gp_sample2.parquet')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/lib/python3.11/site-packages/geopandas/io/arrow.py",
line 461, in _read_parquet
return _arrow_to_geopandas(table)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/geopandas/io/arrow.py",
line 318, in _arrow_to_geopandas
_validate_metadata(metadata)
File "/opt/homebrew/lib/python3.11/site-packages/geopandas/io/arrow.py",
line 162, in _validate_metadata
raise ValueError(
ValueError: 'geo' metadata in Parquet/Feather file is missing required key
'crs' for column 'geometry'
```
`crs` field is optional in recent versions of GeoParquet standard, and
setting it as `null` has a different meaning from omitting it. We can omit
`crs` by default. This requires us to upgrade geopandas to 0.13.2, and drop
support for Python 3.7 since geopandas dropped support for Python 3.7 since
0.11.
The value for `geoparquet.crs` or `geoparquet.crs.<column_name>` option user
specified can be one of the following values:
* `""` (empty string): omit the `crs` metadata
* `"null"`: explicitly setting `crs` as `null`
* `"{...PROJJSON...}"`: explicitly setting `crs` to specified PROJJSON object
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]