etseidl commented on code in PR #9997:
URL: https://github.com/apache/arrow-rs/pull/9997#discussion_r3277288513
##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -621,6 +621,26 @@ fn read_column_chunk<'a>(
Ok(col)
}
+fn read_schema(prot: &mut ThriftSliceInputProtocol) ->
Result<SchemaDescriptor> {
+ let mut schema = read_thrift_vec::<SchemaElement,
ThriftSliceInputProtocol>(&mut *prot)?;
+ // An earlier version of this crate enforced this when decoding
LogicalType. Now that
+ // the decoder is macro generated, we do this to preserve the original
behavior.
+ // TODO: this was done due to a line in the spec saying an unset algorithm
defaults
+ // to SPHERICAL. But there is no default in the Thrift, so it would be
better to set the
+ // default when consumed.
+ // See
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geography
+ for se in schema.iter_mut() {
+ match se.logical_type.as_mut() {
+ Some(LogicalType::Geography(g)) if g.algorithm.is_none() => {
+ g.algorithm = Some(Default::default());
+ }
Review Comment:
@paleolimbot I'm wondering if this is still necessary, or if this can be
enforced where the logical type is consumed. For instance, the one place I
could find where removing this logic would have an impact is when handling the
extension type.
```rust
LogicalType::Geography(geography) => {
let algorithm = geography.algorithm.map(|a|
a.try_as_edges()).transpose()?;
let md =
parquet_geospatial::WkbMetadata::new(geography.crs.as_deref(), algorithm);
...
}
```
could instead be
```rust
LogicalType::Geography(geography) => {
// if algorithm is None, then set to the default
(EdgeInterpolationAlgorithm::SPERICAL).
let algorithm = geography.algorithm.or(Some(Default::default()));
let algorithm = algorithm.map(|a| a.try_as_edges()).transpose()?;
let md =
parquet_geospatial::WkbMetadata::new(geography.crs.as_deref(), algorithm);
...
}
```
Would removing this cause a lot of thrash downstream?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]