jorisvandenbossche commented on PR #45459: URL: https://github.com/apache/arrow/pull/45459#issuecomment-2749495657
To confirm, the segfault I mentioned is no longer present with the latest change to remove `GeoCrsContext` handling. What triggered it with an older version of this branch: ```python # creation of the file gdf = geopandas.GeoDataFrame(geometry=geopandas.points_from_xy([1,2,3], [1,2,3]), crs="EPSG:3857") table = pa.table(gdf.to_arrow()) pa.feather.write_feather(table, "test_geometry_3857.arrow") ``` And then writing that Arrow table segfaults if the extension type was enabled: ```python In [1]: import pyarrow.feather In [2]: table = pa.feather.read_table("test_geometry_3857.arrow") In [3]: table.schema Out[3]: geometry: binary -- field metadata -- ARROW:extension:name: 'geoarrow.wkb' ARROW:extension:metadata: '{"crs": "{\"$schema\":\"https://proj.org/sch' + 2613 -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 373 In [4]: import pyarrow.parquet as pq # works fine when no geoarrow extension type is enabled In [5]: pq.write_table(table, "test_geometry_from_table_no_ext.parquet") In [6]: import geoarrow.pyarrow as ga In [7]: table = pa.feather.read_table("test_geometry_3857.arrow") In [8]: table.schema Out[8]: geometry: extension<geoarrow.wkb<WkbType>> -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 373 # segfaults when it is enabled In [9]: pq.write_table(table, "test_geometry_from_table_ext.parquet") terminate called after throwing an instance of 'parquet::ParquetException' what(): Crs encoding 'unknown' is not suppored by GeoCrsContext Aborted (core dumped) ``` Not sure if this is related to geopandas creating invalid crs metadata? (the bug you fixed on geopandas main) But so in any case, the above now works without segfault on the latest version of this branch. This difference makes me wonder one other thing, though, which is still present: for the Parquet code to "see" a geometry column in the Arrow data it is writing, it needs to be an actual _registered_ extension type, and having the extension metadata in the field metadata is not sufficient? (purely for the spec, the metadata is what defines the extension type, and so someone roundtripping Parquet data (e.g. reading in, doing some filtering, writing out again) would loose this type information if they do not have the extension type registered) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org