Hello,

There is an emerging spec[1] for how to store geospatial data in Arrow
+ pass through parquet files in the geopandas world. There is even a
new R package that implements a wrapper to do the same in R[2]. These
both define a serialization[3] for storing geospatial data as an Arrow
table (and thus also when saving to parquet with Arrow).

I could see a number of ways that we might interact with standards
like these, and for any of these that we pursue it would be good to
clarify that in our docs:

1. Point to the standard — we could mention that this standard exists
and that if someone is building a geospatial data aware application,
they _could_ refer to this standard if they want to.
2. Adopt a/this standard — this could range from stating that we've
adopted it as the way that spatial data _ought_ to be stored to asking
the creators if maintaining it within the Arrow project itself would
be better (either by adopting it or creating a fork — of course
communication with the folks working on it now would be critical!)
3. Create extension type(s) for geospatial data — this would require
adopting a standard like the one linked, but on top of that providing
an extension type within Arrow itself that the various clients could
implement as they saw fit.
4. Create new, fully separate type(s) for geospatial data — again,
this would require adopting a standard of some sort, but we would
implement it as a specific type and presumably support it in all of
the clients as we could.

There are of course pros and cons to all of these. This type of data
*is* somewhat specialized and I don't think we want to have a huge
profusion of types for all of the possible specialized data types out
there. But, at a minimum we should acknowledge (or adopt) a standard
if it exists and encourage implementations that use Arrow to follow
that standard (like sfarrow does to be compatible with geopandas) so
that some level of interoperability is there + people aren't needing
to reinvent the wheel each time they store spatial data.

Thoughts? Are there other projects out there that already do something
like this with Arrow that we should consider?

[1] https://github.com/geopandas/geo-arrow-spec/pull/2
[2] https://github.com/wcjochem/sfarrow
[3] for now they create a binary WKB column + attach a bit of metadata
to the schema that that's what happened, though there are other ways
one could encode this and the spec might include other way(s) to store
this data in the future.

-Jon

Reply via email to