laskoviymishka commented on issue #1090:
URL: https://github.com/apache/iceberg-go/issues/1090#issuecomment-4482533843
Good investigation!
What I’d suggest as plan here:
1. **Generate locally, then pin the bytes.**
Add a small reproducible generator under something like
`table/testdata/geo/generate/`. `pyarrow + geoarrow-pyarrow` seems totally fine
here — probably the most ergonomic stack for WKB + GeoArrow extension metadata.
I’d check in both the script and the generated `.parquet` files. The
committed parquet files are the actual contract; the script is just there so
future readers can see how they were produced and regenerate them if needed.
Keep it lightweight: no package-manager setup, just README instructions with
the `pip install` command and the package versions used.
2. **Keep the fixture set small and intentional.**
A few well-chosen cases are more useful than trying to cover everything.
Exhaustive conformance is GeoArrow’s job, not Iceberg’s. I’d start with
something like:
* point column, geometry, WGS84
* polygon column, geometry, WGS84
* geography variant of one of those
* mixed geometry types in one column
* nulls plus one empty geometry
3. **Document the upstream migration path.**
In the README, call out that once Apache Iceberg or
`apache/parquet-testing` has canonical geo fixtures, we should replace these
locally generated files with upstream-pinned bytes. That makes it an explicit
follow-up, not quiet tech debt.
Scope-wise, I’d keep the PR to: generator script, the small set of generated
`.parquet` files, and a simple Go loader test that opens each file and checks
it parses. No geo-specific assertions yet, since the geo type plumbing hasn’t
landed.
That gets the fixtures into the tree, gives #984 and the downstream PRs
something to reference, and we can tighten the assertions as the feature lands.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]