2010YOUY01 commented on PR #560:
URL: https://github.com/apache/sedona-db/pull/560#issuecomment-3835471153
This PR is reworked, the TLDR for the option semantics is:
1. For a regular parquet file with binary column (but physically
WKB-encoded), use this option to specify Binary column as geometry
```
sd.read_parquet(
"geo_legacy.parquet",
geometry_columns={
"geometry": {"encoding": "WKB", "crs": "EPSG:4326", "edges":
"planar"}
},
)
```
2. If a column is already geometry (inferred from parquet metadata), this
option can be used to provide optional but missing field; if one field is
already inferred from metadata, and set again from the option, an error occur.
This feels safer to me, but I'm open to other opinions.
```
# Inferred option from metadata:
# {"encoding": "WKB"} # "crs" is missing
# Provided 'crs' option from `geometry_columns` is allowed
sd.read_parquet(
"geo.parquet",
geometry_columns={
"geometry": {"crs": "EPSG:4326"}
},
)
# Now 'geometry' column is a geometry column with crs=4326
```
```
# Inferred option from metadata:
# {"encoding": "WKB", "crs": "EPSG:4326"}
# Not allowed to provide option that is already inferred from schema
sd.read_parquet(
"geo.parquet",
geometry_columns={
"geometry": {"crs": "EPSG:3857"}
},
)
# Errors...
```
## Implementation/Key changes
```text
(existing)
geoparquet metadata --> (per col) GeoParquetColumnMetadata --> schema
(PR)
geoparquet metadata --> (per col) GeoParquetColumnMetadata ----+
| (combine)
|
user option geometry_columns --> GeoParquetColumnMetadata -----+--> schema
```
1. Parse option with `serde_json::from_str`, the same as parquet metadata,
and store the column options inside `GeoParquetFormat ->
TableGeoParquetOption`, since `TableFormat` trait is used to build schema. When
`infer_schema()` is called, combine the `GeoParquetColumnMetadata` from both
metadata and `geometry_columns` option.
2. Refactor the `GeoParquetColumnMetadata` to make its `encoding` field
optional. Since this is a required field for GeoParquet spec, assertions are
added to the existing deserializer to ensure it exist in the parquet metadata
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]