paleolimbot commented on PR #45459: URL: https://github.com/apache/arrow/pull/45459#issuecomment-2744994569
Ok! Thank you for taking another look! I know this is a big PR 😬 In addition to the smaller comments, the two high-level changes here are: - I removed the `GeoCrsContext` from the `ArrowWriterProperties`. I think there are a lot of things to consider with that particular change and I am not sure this PR is the place to do them. As is, this PR does a direct mapping from the GeoArrow `"crs": ...` to the Parquet logical type `crs`. This allows a high-level producer to implement the recommendation in the format (by writing a GeoArrow CRS of `projjson:some_field` and writing a schema field that contains the appropriate value) but will by default write whatever was in the GeoArrow crs into the Parquet crs. The reverse direction (Parquet -> GeoArrow) does the same thing but does check for the two recommended cases (i.e., it checks in the schema metadata for `some_field` if it sees `projjson:some_field`). This aligns with verbal feedback from @jorisvandenbossche (but I'll let him speak for himself if he has the bandwidth to review this!). - I made the NaN handling explicit/added tests. NaNs in coordinates are not supposed to happen (except for POINT EMPTY) and other bounding implementations do not have consistent behaviour here. I tried to align the behaviour with what I observed from this Parquet implementation (i.e., NaNs are not included in min/max stats, and if NaNs are read from Thrift they are treated as if no statistics were written). The "treating as if no statistics are written" is not quite as easy to map here because our geo stats contain information about more than one min/max pair, but I at least made sure that nans stay NaNs if they show up in Thrift. I updated the example in the PR Description as well (some of the comments were out of date). One should be able to swap in an arbitrary GeoPandas GeoDataFrame to give this an interactive spin based on that example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org