paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2505129949


##########
parquet-geospatial/src/crs.rs:
##########
@@ -0,0 +1,97 @@
+use std::{collections::HashMap, sync::Arc};
+
+use arrow_schema::{Schema, SchemaBuilder};
+use serde_json::{Value, json};
+
+#[derive(Debug)]
+pub enum Crs {
+    Projjson(serde_json::Value),
+    Srid(u64),
+    Other(String),
+}
+
+impl Crs {
+    // TODO: make fallible
+    fn try_from_parquet_str(crs: &str, metadata: &HashMap<String, String>) -> 
Self {
+        let de: Value = serde_json::from_str(crs).unwrap();
+
+        // A CRS that does not exist or is empty defaults to 4326
+        // TODO: http link to parquet geospatial doc
+        let Some(crs) = de["crs"].as_str() else {
+            return Crs::Srid(4326);
+        };
+
+        if let Some(key) = crs.strip_prefix("projjson:") {
+            let Some(proj_meta) = metadata.get(key) else {
+                panic!("Failed to find key in meta: {:?}", metadata)
+            };
+
+            Self::Projjson(serde_json::from_str(proj_meta).unwrap())
+        } else if let Some(srid) = crs.strip_prefix("srid:") {
+            Self::Srid(srid.parse().unwrap())
+        } else {

Review Comment:
   > Just to make sure I'm understanding this case correctly. The incoming 
metadata would look like:
   `{"crs": "<PROJJSON string>"}`
   
   From Parquet, the actual `crs` payload is most commonly `Some("<PROJJSON 
string>")`. This is because there is no way to reasonably implement dumping the 
string into the metadata and writing `Some("projjson:<key>")` in Arrow C++, and 
because most CRSes arrive from GeoArrow as PROJJSON.
   
   From GeoArrow, it's possible to get `{"crs": "<PROJJSON string>"}` (at least 
one version of GeoPandas), `{"crs": "AUTH:CODE"}`, or `{"crs": {<JSON object of 
PROJJSON>}}` (most common).
   
   > I would be a little bit concerned about the potential for false positives 
or potentially malformed PROJ objects
   
   For the specific case of converting Parquet CRSes to GeoArrow ones, I don't 
think this matters. It's this implementation's job to pass on as much CRS 
information to GeoArrow as accurately as it can (and vice versa). If you try to 
parse JSON and it fails, you can (and I would argue you should) just pass on 
the string in either direction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to