nevi-me commented on a change in pull request #7917:
URL: https://github.com/apache/arrow/pull/7917#discussion_r472105674



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -83,12 +90,77 @@ where
         .map(|fields| Schema::new_with_metadata(fields, metadata))
 }
 
+/// Try to convert Arrow schema metadata into a schema
+fn get_arrow_schema_from_metadata(encoded_meta: &str) -> Option<Schema> {
+    let decoded = base64::decode(encoded_meta);
+    match decoded {
+        Ok(bytes) => {
+            let slice = if bytes[0..4] == [255u8; 4] {
+                &bytes[8..]
+            } else {
+                bytes.as_slice()
+            };
+            let message = arrow::ipc::get_root_as_message(slice);
+            message
+                .header_as_schema()
+                .map(arrow::ipc::convert::fb_to_schema)
+        }
+        Err(err) => {
+            // The C++ implementation returns an error if the schema can't be 
parsed.
+            // To prevent this, we explicitly log this, then compute the 
schema without the metadata
+            eprintln!(
+                "Unable to decode the encoded schema stored in {}, {:?}",
+                super::ARROW_SCHEMA_META_KEY,
+                err
+            );
+            None
+        }
+    }
+}
+
+/// Mutates writer metadata by encoding the Arrow schema and storing it in the 
metadata.
+/// If there is an existing Arrow schema metadata, it is replaced.
+pub fn add_encoded_arrow_schema_to_metadata(

Review comment:
       I can change the visibility to only the crate. We'll likely only ever 
use the function in one place, so that's why I hadn't split it.
   
   I'd also prefer replacing `Vec<KeyValue>` with a `HashMap` partly because 
it's a `parquet_format` detail, and it's more convenient to work with hashmaps. 
I can open a JIRA for this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to