paleolimbot commented on PR #10065: URL: https://github.com/apache/arrow-rs/pull/10065#issuecomment-4653799290
> Is this a "breaking change" Definitely not in the "Rust code will stop compiling" sense. From a metadata semantics point of view I would consider this a bug fix...the currently emitted metadata would have either triggered errors or resulted in silently/subtly incorrect results in at least one common case (the default Parquet != GeoArrow default CRS), and intentionally invalid metadata is required to write geography. In order to have any of this code triggered one would need to build parquet with the geospatial feature enabled, which I am not sure is common (but I don't know this). That said there is a workaround if this needs to be punted (in SedonaDB we have a UDF that intentionally creates invalid GeoArrow field metadata to write valid Parquet output). > DO you have any suggestions on how to review this PR? I hope the below is helpful! > other implementations I can cross reference with Arrow C++ would be the other one: https://github.com/apache/arrow/blob/bfc9cdb2cc8e1dcbfef5e97db2e3e9f5a15fd356/cpp/src/parquet/geospatial/util_json_internal.cc#L35-L142 The only difference here is the `srid:0` conversion. That was only just clarified in the Parquet spec in the last few weeks and I haven't had a chance to PR it into that spec yet. > Like other implementations I can cross reference with The two CRS specs: - https://geoarrow.org/extension-types.html#extension-metadata - https://github.com/apache/parquet-format/blob/master/Geospatial.md#coordinate-reference-system > Is it right to have embedded quotes here? Good catch! I'll update. It works because we do try to parse the value as JSON first (to avoid taking a valid JSON object from the Parquet field, which is a string, escaping it, and re-inserting it into a the JSON GeoArrow metadata) but it's not what I was trying to test 😬 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
