SaymV opened a new pull request, #3067:
URL: https://github.com/apache/iceberg-python/pull/3067
1. Adds geospatial bounds metric computation from WKB values (geometry +
geography).
2. Adds spatial predicate expression/binding support (`st-contains`,
`st-intersects`, `st-within`, `st-overlaps`) with conservative evaluator
behavior.
3. Improves Arrow/Parquet interoperability for GeoArrow WKB, including
explicit handling of geometry vs planar-geography ambiguity at the
schema-compatibility boundary.
This increment is compatibility-first and does **not** introduce new runtime
dependencies.
Base `geometry`/`geography` types existed, but there were still practical
gaps:
- Geospatial columns were not contributing spec-encoded bounds in data-file
metrics.
- Spatial predicates were not modeled end-to-end in expression
binding/visitor plumbing.
- GeoArrow metadata can be ambiguous for `geometry` vs `geography(...,
"planar")`, causing false compatibility failures during import/add-files flows.
- Added pure-Python geospatial utilities in `pyiceberg/utils/geospatial.py`:
- WKB envelope extraction
- antimeridian-aware geography envelope merge
- Iceberg geospatial bound serialization/deserialization
- Added `GeospatialStatsAggregator` and geospatial aggregate helpers in
`pyiceberg/io/pyarrow.py`.
- Updated write/import paths to compute geospatial bounds from actual row
values (not Parquet binary min/max stats):
- `write_file(...)`
- `parquet_file_to_data_file(...)`
- Prevented incorrect partition inference from geospatial envelope bounds.
- Added expression types in `pyiceberg/expressions/__init__.py`:
- `STContains`, `STIntersects`, `STWithin`, `STOverlaps`
- bound counterparts and JSON parsing support
- Added visitor dispatch/plumbing in `pyiceberg/expressions/visitors.py`.
- Behavior intentionally conservative in this increment:
- row-level expression evaluator raises `NotImplementedError`
- manifest/metrics evaluators return conservative might-match defaults
- translation paths preserve spatial predicates where possible
- Added GeoArrow WKB decoding helper in `pyiceberg/io/pyarrow.py` to map
extension metadata to Iceberg geospatial types.
- Added boundary-only compatibility option in `pyiceberg/schema.py`:
- `_check_schema_compatible(...,
allow_planar_geospatial_equivalence=False)`
- Enabled that option only in `_check_pyarrow_schema_compatible(...)` to
allow:
- `geometry` <-> `geography(..., "planar")` when CRS strings match
- while still rejecting spherical geography mismatches
- Added one-time warning log when `geoarrow-pyarrow` is unavailable and code
falls back to binary.
- Updated user docs: `mkdocs/docs/geospatial.md`
- Added decisions record: `mkdocs/docs/dev/geospatial-types-decisions-v1.md`
Added/updated tests across:
- `tests/utils/test_geospatial.py`
- `tests/io/test_pyarrow_stats.py`
- `tests/io/test_pyarrow.py`
- `tests/expressions/test_spatial_predicates.py`
- `tests/integration/test_geospatial.py`
Coverage includes:
- geospatial bound encoding/decoding (XY/XYZ/XYM/XYZM)
- geography antimeridian behavior
- geospatial metrics generation from write/import paths
- spatial predicate modeling/binding/translation behavior
- planar ambiguity compatibility guardrails
- warning behavior for missing `geoarrow-pyarrow`
- No user-facing API removals.
- New compatibility relaxation is intentionally scoped to Arrow/Parquet
schema-compatibility boundary only.
- Core schema/type compatibility remains strict elsewhere.
- No spatial pushdown/row execution implementation in this PR.
- Spatial predicate execution semantics.
- Spatial predicate pushdown/pruning.
- Runtime WKB <-> WKT conversion strategy.
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
## Are these changes tested?
## Are there any user-facing changes?
<!-- In the case of user-facing changes, please add the changelog label. -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]