paleolimbot commented on code in PR #8222: URL: https://github.com/apache/arrow-rs/pull/8222#discussion_r2299657546
########## parquet/src/basic.rs: ########## @@ -894,8 +906,10 @@ impl From<LogicalType> for parquet::LogicalType { LogicalType::Uuid => parquet::LogicalType::UUID(Default::default()), LogicalType::Float16 => parquet::LogicalType::FLOAT16(Default::default()), LogicalType::Variant => parquet::LogicalType::VARIANT(Default::default()), - LogicalType::Geometry => parquet::LogicalType::GEOMETRY(Default::default()), - LogicalType::Geography => parquet::LogicalType::GEOGRAPHY(Default::default()), + LogicalType::Geometry { crs } => parquet::LogicalType::GEOMETRY(GeometryType { crs }), + LogicalType::Geography { crs, algorithm } => { + parquet::LogicalType::GEOGRAPHY(GeographyType { crs, algorithm }) + } Review Comment: In Arrow C++ I tried to detect `crs` values that were definitely lon/lat and ensure the Parquet logical type was "unset" to maximize compatibility with readers that didn't want to or couldn't handle CRS values. - `"EPSG:4326"` - `"OGC:CRS84"` - `{..., "id": {"authority": "EPSG", "code": 4326}` - `{..., "id": {"authority": "EPSG", "code": "4326"}` - `{..., "id": {"authority": "OGC", "code": "CRS84"}` https://github.com/apache/arrow/blob/f8b20f131a072ef423e81b8a676f42a82255f4ec/cpp/src/parquet/geospatial/util_json_internal.cc#L47-L81 ########## parquet/src/basic.rs: ########## @@ -850,8 +859,11 @@ impl From<parquet::LogicalType> for LogicalType { parquet::LogicalType::UUID(_) => LogicalType::Uuid, parquet::LogicalType::FLOAT16(_) => LogicalType::Float16, parquet::LogicalType::VARIANT(_) => LogicalType::Variant, - parquet::LogicalType::GEOMETRY(_) => LogicalType::Geometry, - parquet::LogicalType::GEOGRAPHY(_) => LogicalType::Geography, + parquet::LogicalType::GEOMETRY(t) => LogicalType::Geometry { crs: t.crs }, + parquet::LogicalType::GEOGRAPHY(t) => LogicalType::Geography { + crs: t.crs, + algorithm: t.algorithm, + }, Review Comment: In Arrow C++ this was a bit tricky to deal with: - `projjson:crs_key` defines a CRS where the PROJJSON payload is stored in the Parquet schema metadata. This requires passing the schema metadata into this function (or something between this LogicalType and the Arrow Field output), which by the looks of it is a bit tricky to do here. In Arrow C++ we do handle this case (i.e., we'll return read a Parquet `projjson:crs_key` as GeoArrow whose CRS payload is the actual PROJJSON that was in the Parquet schema metadata. - `srid:1234` can be converted to GeoArrow as `{"crs": "1234", "crs_type": "srid"}`. - `<any other string>` can be converted to GeoArrow as `{"crs": "<any other string>"}`. In Arrow C++ I also made the choice to ensure that if `<any other string>` happened to be valid JSON, that it shouldn't be double-escaped as a JSON string. There are test files with all three of these in the apache/parquet-testing repo 🙂 https://github.com/apache/arrow/blob/f8b20f131a072ef423e81b8a676f42a82255f4ec/cpp/src/parquet/geospatial/util_json_internal.cc#L105-L125 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org