paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2518424791


##########
parquet/src/arrow/schema/mod.rs:
##########
@@ -76,16 +77,18 @@ pub(crate) fn parquet_to_arrow_schema_and_fields(
     key_value_metadata: Option<&Vec<KeyValue>>,
 ) -> Result<(Schema, Option<ParquetField>)> {
     let mut metadata = 
parse_key_value_metadata(key_value_metadata).unwrap_or_default();
-    let maybe_schema = metadata
+    let mut maybe_schema = metadata
         .remove(super::ARROW_SCHEMA_META_KEY)
         .map(|value| get_arrow_schema_from_metadata(&value))
         .transpose()?;
 
     // Add the Arrow metadata to the Parquet metadata skipping keys that 
collide
-    if let Some(arrow_schema) = &maybe_schema {
+    if let Some(arrow_schema) = maybe_schema.as_mut() {
         arrow_schema.metadata().iter().for_each(|(k, v)| {
             metadata.entry(k.clone()).or_insert_with(|| v.clone());
         });
+        #[cfg(feature = "geospatial")]
+        parquet_geospatial::crs::replace_keyvalue(arrow_schema, &metadata)

Review Comment:
   I'm not sure we need to modify the embedded schema here, which is probably 
more accurate (i.e., already contains the CRS in the form the user 
intended)...was this to help with roundtrip testing?



##########
parquet-geospatial/src/crs.rs:
##########
@@ -0,0 +1,59 @@
+use std::{collections::HashMap, sync::Arc};
+
+use arrow_schema::{Schema, SchemaBuilder};
+use serde::{Deserialize, Serialize};
+
+#[derive(Debug, Default, Serialize, Deserialize)]
+pub struct Crs {
+    crs: Option<String>,
+    crs_type: Option<String>,
+}
+
+impl Crs {
+    // TODO: make fallible
+    fn try_from_parquet_str(crs: &str, metadata: &HashMap<String, String>) -> 
Self {
+        let crs: Crs = serde_json::from_str(crs).unwrap();
+
+        let Some(crs_value) = &crs.crs else {
+            return crs;
+        };
+
+        let Some(key) = crs_value.strip_prefix("projjson:") else {
+            return crs;
+        };
+
+        let Some(proj_meta) = metadata.get(key) else {
+            return crs;
+        };
+
+        Self {
+            crs: Some(proj_meta.clone()),
+            crs_type: Some(String::from("projjson")),
+        }
+    }
+
+    pub(super) fn to_arrow_string(&self) -> String {
+        serde_json::to_string(&self).unwrap_or_default()
+    }

Review Comment:
   To avoid output like `{"crs": "{\"type\": \"GeographicCRS\", ...", 
"crs_type": "projjson"}` (i.e., JSON escaped as a JSON string), I think it 
might be best to attempt parsing `self.crs` (and if it succeeds, return the 
corresponding `Value`). This will also help with roundtripping as the metadata 
probably arrived as `{"crs": {"type": "GeographicCRS", ...", "crs_type": 
"projjson"}`. This may solve the roundtrip testing issue if there was one.
   
   Because `"edges"` will also need to get added here, maybe this could return 
`(Option<Value>, Option<Value>)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to