paleolimbot commented on code in PR #8222:
URL: https://github.com/apache/arrow-rs/pull/8222#discussion_r2299657546


##########
parquet/src/basic.rs:
##########
@@ -894,8 +906,10 @@ impl From<LogicalType> for parquet::LogicalType {
             LogicalType::Uuid => 
parquet::LogicalType::UUID(Default::default()),
             LogicalType::Float16 => 
parquet::LogicalType::FLOAT16(Default::default()),
             LogicalType::Variant => 
parquet::LogicalType::VARIANT(Default::default()),
-            LogicalType::Geometry => 
parquet::LogicalType::GEOMETRY(Default::default()),
-            LogicalType::Geography => 
parquet::LogicalType::GEOGRAPHY(Default::default()),
+            LogicalType::Geometry { crs } => 
parquet::LogicalType::GEOMETRY(GeometryType { crs }),
+            LogicalType::Geography { crs, algorithm } => {
+                parquet::LogicalType::GEOGRAPHY(GeographyType { crs, algorithm 
})
+            }

Review Comment:
   In Arrow C++ I tried to detect `crs` values that were definitely lon/lat and 
ensure the Parquet logical type was "unset" to maximize compatibility with 
readers that didn't want to or couldn't handle CRS values.
   
   - `"EPSG:4326"`
   - `"OGC:CRS84"`
   - `{..., "id": {"authority": "EPSG", "code": 4326}`
   - `{..., "id": {"authority": "EPSG", "code": "4326"}`
   - `{..., "id": {"authority": "OGC", "code": "CRS84"}`
   
   
https://github.com/apache/arrow/blob/f8b20f131a072ef423e81b8a676f42a82255f4ec/cpp/src/parquet/geospatial/util_json_internal.cc#L47-L81



##########
parquet/src/basic.rs:
##########
@@ -850,8 +859,11 @@ impl From<parquet::LogicalType> for LogicalType {
             parquet::LogicalType::UUID(_) => LogicalType::Uuid,
             parquet::LogicalType::FLOAT16(_) => LogicalType::Float16,
             parquet::LogicalType::VARIANT(_) => LogicalType::Variant,
-            parquet::LogicalType::GEOMETRY(_) => LogicalType::Geometry,
-            parquet::LogicalType::GEOGRAPHY(_) => LogicalType::Geography,
+            parquet::LogicalType::GEOMETRY(t) => LogicalType::Geometry { crs: 
t.crs },
+            parquet::LogicalType::GEOGRAPHY(t) => LogicalType::Geography {
+                crs: t.crs,
+                algorithm: t.algorithm,
+            },

Review Comment:
   In Arrow C++ this was a bit tricky to deal with:
   
   - `projjson:crs_key` defines a CRS where the PROJJSON payload is stored in 
the Parquet schema metadata. This requires passing the schema metadata into 
this function (or something between this LogicalType and the Arrow Field 
output), which by the looks of it is a bit tricky to do here. In Arrow C++ we 
do handle this case (i.e., we'll return read a Parquet `projjson:crs_key` as 
GeoArrow whose CRS payload is the actual PROJJSON that was in the Parquet 
schema metadata.
   - `srid:1234` can be converted to GeoArrow as `{"crs": "1234", "crs_type": 
"srid"}`.
   - `<any other string>` can be converted to GeoArrow as `{"crs": "<any other 
string>"}`. In Arrow C++ I also made the choice to ensure that if `<any other 
string>` happened to be valid JSON, that it shouldn't be double-escaped as a 
JSON string.
   
   There are test files with all three of these in the apache/parquet-testing 
repo 🙂 
   
   
https://github.com/apache/arrow/blob/f8b20f131a072ef423e81b8a676f42a82255f4ec/cpp/src/parquet/geospatial/util_json_internal.cc#L105-L125



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to