paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2501200659
##########
parquet/src/arrow/schema/mod.rs:
##########
@@ -76,16 +77,18 @@ pub(crate) fn parquet_to_arrow_schema_and_fields(
key_value_metadata: Option<&Vec<KeyValue>>,
) -> Result<(Schema, Option<ParquetField>)> {
let mut metadata =
parse_key_value_metadata(key_value_metadata).unwrap_or_default();
- let maybe_schema = metadata
+ let mut maybe_schema = metadata
.remove(super::ARROW_SCHEMA_META_KEY)
.map(|value| get_arrow_schema_from_metadata(&value))
.transpose()?;
// Add the Arrow metadata to the Parquet metadata skipping keys that
collide
- if let Some(arrow_schema) = &maybe_schema {
+ if let Some(arrow_schema) = maybe_schema.as_mut() {
arrow_schema.metadata().iter().for_each(|(k, v)| {
metadata.entry(k.clone()).or_insert_with(|| v.clone());
});
+ #[cfg(feature = "geospatial")]
+ parquet_geospatial::crs::parquet_to_arrow(arrow_schema, &metadata)
Review Comment:
> The issue is when there's an Arrow metadata key of projjson:<key> the
actual projjson CRS data exists in the Parquet KeyValue metdata. When reading
parquet, this seems to be the last place in the schema conversion chain that
metadata is still available.
For what it's worth, there is no writer that can actually write this yet
(for a similar reason: no Parquet implementation has access to a mutable key
value metadata when converting types on the write end) and if you'd like to
punt on this, you can pass the `"crs": "projjson:<key>"` as the GeoArrow CRS to
allow a caller at a higher level to figure this out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]