etseidl commented on code in PR #9997:
URL: https://github.com/apache/arrow-rs/pull/9997#discussion_r3277288513


##########
parquet/src/file/metadata/thrift/mod.rs:
##########
@@ -621,6 +621,26 @@ fn read_column_chunk<'a>(
     Ok(col)
 }
 
+fn read_schema(prot: &mut ThriftSliceInputProtocol) -> 
Result<SchemaDescriptor> {
+    let mut schema = read_thrift_vec::<SchemaElement, 
ThriftSliceInputProtocol>(&mut *prot)?;
+    // An earlier version of this crate enforced this when decoding 
LogicalType. Now that
+    // the decoder is macro generated, we do this to preserve the original 
behavior.
+    // TODO: this was done due to a line in the spec saying an unset algorithm 
defaults
+    // to SPHERICAL. But there is no default in the Thrift, so it would be 
better to set the
+    // default when consumed.
+    // See 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#geography
+    for se in schema.iter_mut() {
+        match se.logical_type.as_mut() {
+            Some(LogicalType::Geography(g)) if g.algorithm.is_none() => {
+                g.algorithm = Some(Default::default());
+            }

Review Comment:
   @paleolimbot I'm wondering if this is still necessary, or if this can be 
enforced where the logical type is consumed. For instance, the one place I 
could find where removing this logic would have an impact is when handling the 
extension type.
   ```rust
           LogicalType::Geography(geography) => {
               let algorithm = geography.algorithm.map(|a| 
a.try_as_edges()).transpose()?;
               let md = 
parquet_geospatial::WkbMetadata::new(geography.crs.as_deref(), algorithm);
           ...
           }
   ```
   could instead be
   ```rust
           LogicalType::Geography(geography) => {
               // if algorithm is None, then set to the default 
(EdgeInterpolationAlgorithm::SPERICAL).
               let algorithm = geography.algorithm.or(Some(Default::default()));
               let algorithm = algorithm.map(|a| a.try_as_edges()).transpose()?;
               let md = 
parquet_geospatial::WkbMetadata::new(geography.crs.as_deref(), algorithm);
           ...
           }
   ```
   Or we could add an impl for `GeographyType` with an `algorithm()` function 
that does the same.
   
   Would removing this cause a lot of thrash downstream?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to