BlakeOrth commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2501098970


##########
parquet/src/arrow/schema/mod.rs:
##########
@@ -76,16 +77,18 @@ pub(crate) fn parquet_to_arrow_schema_and_fields(
     key_value_metadata: Option<&Vec<KeyValue>>,
 ) -> Result<(Schema, Option<ParquetField>)> {
     let mut metadata = 
parse_key_value_metadata(key_value_metadata).unwrap_or_default();
-    let maybe_schema = metadata
+    let mut maybe_schema = metadata
         .remove(super::ARROW_SCHEMA_META_KEY)
         .map(|value| get_arrow_schema_from_metadata(&value))
         .transpose()?;
 
     // Add the Arrow metadata to the Parquet metadata skipping keys that 
collide
-    if let Some(arrow_schema) = &maybe_schema {
+    if let Some(arrow_schema) = maybe_schema.as_mut() {
         arrow_schema.metadata().iter().for_each(|(k, v)| {
             metadata.entry(k.clone()).or_insert_with(|| v.clone());
         });
+        #[cfg(feature = "geospatial")]
+        parquet_geospatial::crs::parquet_to_arrow(arrow_schema, &metadata)

Review Comment:
   The primary question I have is related to these lines here. I started off 
trying to solve the most complex case of geospatial CRS metdata noted by 
@paleolimbot in this comment:
    - https://github.com/apache/arrow-rs/pull/8222#discussion_r2299643001
   
   The issue is when there's an Arrow metadata key of `projjson:<key>` the 
actual `projjson` CRS data exists in the Parquet `KeyValue` metdata. When 
reading parquet, this seems to be the last place in the schema conversion chain 
that `metadata` is still available.
   
   If we want to avoid breaking the existing public API for building extension 
types I think converting from the parquet CRS to an Arrow CRS has to happen 
here since we need access to `metadata`. An alternate approach would be passing 
the parquet metadata further down the stack so it's available for users to 
reference. At the current time it looks like only `GEOMETRY` and `GEOGRAPHY` 
would be consumers of that, but it could be a more future-proof solution if 
we'd prefer to take that route.
   
   If my review of the PRs was correct, I believe at least some of this tooling 
is relatively new for the Variant type support from @alamb, so any guidance on 
a preferred way to handle this case would be appreciated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to