2010YOUY01 commented on code in PR #560:
URL: https://github.com/apache/sedona-db/pull/560#discussion_r2758192176
##########
python/sedonadb/python/sedonadb/context.py:
##########
@@ -134,14 +135,60 @@ def read_parquet(
files.
options: Optional dictionary of options to pass to the Parquet
reader.
For S3 access, use {"aws.skip_signature": True, "aws.region":
"us-west-2"} for anonymous access to public buckets.
+ geometry_columns: Optional JSON string mapping column name to
+ GeoParquet column metadata (e.g.,
+ '{"geom": {"encoding": "WKB"}}'). Use this to mark binary WKB
+ columns as geometry columns or correct metadata such as the
+ column CRS.
+
+ Supported keys (others in the spec are not implemented):
+ - encoding: "WKB" (required if the column is not already
geometry)
+ - crs: (e.g., "EPSG:4326")
+ - edges: "planar" (default) or "spherical"
+ See spec for details: https://geoparquet.org/releases/v1.1.0/
+
+ Useful for:
+ - Legacy Parquet files with Binary columns containing WKB
payloads.
+ - Overriding GeoParquet metadata when fields like `crs` are
missing.
+
+ Precedence:
+ - GeoParquet metadata is used to infer geometry columns first.
+ - geometry_columns then overrides the auto-inferred schema:
+ - If a column is not geometry in metadata but appears in
+ geometry_columns, it is treated as a geometry column.
+ - If a column is geometry in metadata and also appears in
+ geometry_columns, only the provided keys override; other
+ fields remain as inferred. If a key already exists in
metadata
+ and is provided again with a different value, an error is
+ returned.
+
+ Example:
+ - For `geo.parquet(geo1: geometry, geo2: geometry, geo3:
binary)`,
+ `read_parquet("geo.parquet", geometry_columns='{"geo2":
{"encoding": "WKB"}, "geo3": {"encoding": "WKB"}}')`
+ overrides `geo2` metadata and treats `geo3` as a geometry
column.
+ - If `geo` inferred from metadata has:
+ - `geo: {"encoding": "wkb", "crs": None, "edges":
"spherical"...}`
+ and geometry_columns provides:
+ - `geo: {"crs": 4326}`
+ then the result is (only override provided keys):
+ - `geo: {"encoding": "wkb", "crs": "EPSG:4326", "edges":
"spherical"...}`
+ - If `geo` inferred from metadata has:
+ - `geo: {"encoding": "wkb", "crs": "EPSG:4326"}`
+ and geometry_columns provides:
+ - `geo: {"crs": "EPSG:3857"}`
+ an error is returned for a conflicting key. This option is
only
+ allowed to provide missing optional fields in geometry
columns.
Review Comment:
addressed in
[5113c2f](https://github.com/apache/sedona-db/pull/560/commits/5113c2fc8c5f1836f910fa6c22e750c199ebe65d)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]