alamb commented on code in PR #8225:
URL: https://github.com/apache/arrow-rs/pull/8225#discussion_r2380137202


##########
parquet/src/geospatial/statistics.rs:
##########
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Geospatial statistics for Parquet files.
+//!
+//! This module provides functionality for working with geospatial statistics 
in Parquet files.
+//! It includes support for bounding boxes and geospatial statistics in column 
chunk metadata.
+
+use crate::errors::Result;
+use crate::format::GeospatialStatistics as TGeospatialStatistics;
+use crate::geospatial::bounding_box::BoundingBox;
+
+// ----------------------------------------------------------------------
+// Geospatial Statistics
+
+/// Represents geospatial statistics for a Parquet column or dataset.
+///
+/// This struct contains metadata about the spatial characteristics of 
geospatial data,
+/// including bounding box information and the types of geospatial geometries 
present.
+/// It's used to optimize spatial queries and provide spatial context for data 
analysis.
+///
+/// # Examples
+///
+/// ```
+/// use parquet::geospatial::statistics::GeospatialStatistics;
+/// use parquet::geospatial::bounding_box::BoundingBox;
+///
+/// // Statistics with bounding box
+/// let bbox = BoundingBox::new(0.0, 0.0, 100.0, 100.0);
+/// let stats = GeospatialStatistics::new(Some(bbox), Some(vec![1, 2, 3]));
+/// ```
+#[derive(Clone, Debug, PartialEq, Default)]
+pub struct GeospatialStatistics {
+    /// Optional bounding defining the spatial extent, where None represents a 
lack of information.
+    bbox: Option<BoundingBox>,
+    /// Optional list of geometry type identifiers, where None represents lack 
of information
+    geospatial_types: Option<Vec<i32>>,
+}
+
+impl GeospatialStatistics {
+    /// Creates a new geospatial statistics instance with the specified data.
+    pub fn new(bbox: Option<BoundingBox>, geospatial_types: Option<Vec<i32>>) 
-> Self {
+        Self {
+            bbox,
+            geospatial_types,
+        }
+    }
+}
+
+/// Converts a Thrift-generated geospatial statistics object to the internal 
representation.
+pub fn from_thrift(
+    geo_statistics: Option<TGeospatialStatistics>,
+) -> Result<Option<GeospatialStatistics>> {

Review Comment:
   since this is infallable (always returns `Ok`) maybe we could make the 
`from_thrift` function just return `Option` 
   
   That would also let you simplify the body if you wanted
   ```rust
   pub fn from_thrift(
       geo_statistics: Option<TGeospatialStatistics>,
   ) -> Option<GeospatialStatistics> {
       let geo_stats = geo_statistics?;
       let bbox = geo_stats.bbox.map(|bbox| bbox.into());
       // If vector is empty, then set it to None
       let geospatial_types: Option<Vec<i32>> =
           geo_stats.geospatial_types.filter(|v| !v.is_empty());
       Some(GeospatialStatistics::new(bbox, geospatial_types))
   }
   ```
   
   



##########
parquet/src/file/metadata/mod.rs:
##########
@@ -1430,6 +1444,12 @@ impl ColumnChunkMetaDataBuilder {
         self
     }
 
+    /// Sets geospatial statistics for this column chunk.
+    pub fn set_geo_statistics(mut self, value: 
geo_statistics::GeospatialStatistics) -> Self {
+        self.0.geo_statistics = Some(Box::new(value));

Review Comment:
   I think the point would be if a user already has a 
`Box<GeospatialStatistics>` they could pass it in which would be a pointer 
copy, rather than this which might have to move data from the stack to the heap
   
   



##########
parquet/src/geospatial/statistics.rs:
##########
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Geospatial statistics for Parquet files.
+//!
+//! This module provides functionality for working with geospatial statistics 
in Parquet files.
+//! It includes support for bounding boxes and geospatial statistics in column 
chunk metadata.
+
+use crate::errors::Result;
+use crate::format::GeospatialStatistics as TGeospatialStatistics;
+use crate::geospatial::bounding_box::BoundingBox;
+
+// ----------------------------------------------------------------------
+// Geospatial Statistics
+
+/// Represents geospatial statistics for a Parquet column or dataset.
+///
+/// This struct contains metadata about the spatial characteristics of 
geospatial data,
+/// including bounding box information and the types of geospatial geometries 
present.
+/// It's used to optimize spatial queries and provide spatial context for data 
analysis.
+///
+/// # Examples
+///
+/// ```
+/// use parquet::geospatial::statistics::GeospatialStatistics;
+/// use parquet::geospatial::bounding_box::BoundingBox;
+///
+/// // Statistics with bounding box
+/// let bbox = BoundingBox::new(0.0, 0.0, 100.0, 100.0);
+/// let stats = GeospatialStatistics::new(Some(bbox), Some(vec![1, 2, 3]));
+/// ```
+#[derive(Clone, Debug, PartialEq, Default)]
+pub struct GeospatialStatistics {
+    /// Optional bounding defining the spatial extent, where None represents a 
lack of information.
+    bbox: Option<BoundingBox>,
+    /// Optional list of geometry type identifiers, where None represents lack 
of information
+    geospatial_types: Option<Vec<i32>>,

Review Comment:
   Since this is not a public field, we can also potentially change it in the 
future too, though I do see it is part of the constructor
   
   In my opinion it is fine to leave like this and we can adjust / update in a 
future release



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to