etseidl commented on code in PR #8797:
URL: https://github.com/apache/arrow-rs/pull/8797#discussion_r2516084430


##########
parquet/src/file/metadata/options.rs:
##########
@@ -48,11 +70,70 @@ impl ParquetMetaDataOptions {
         self.schema_descr = Some(val);
     }
 
-    /// Provide a schema to use when decoding the metadata. Returns `Self` for 
chaining.
-    pub fn with_schema(mut self, val: SchemaDescPtr) -> Self {
-        self.schema_descr = Some(val);
-        self
+    // with_schema
+    add_mutator!(schema, SchemaDescPtr);
+
+    /// Returns whether to present the `encoding_stats` field of the 
`ColumnMetaData` as a
+    /// bitmask.
+    ///
+    /// See [`ColumnChunkMetaData::page_encoding_stats_mask`] for an 
explanation of why this
+    /// might be desirable.
+    ///
+    /// [`ColumnChunkMetaData::page_encoding_stats_mask`]:
+    /// crate::file::metadata::ColumnChunkMetaData::page_encoding_stats_mask
+    pub fn encoding_stats_as_mask(&self) -> bool {
+        self.encoding_stats_as_mask
+    }
+
+    /// Convert `encoding_stats` from a vector of [`PageEncodingStats`] to a 
bitmask. This can
+    /// speed up metadata decoding while still enabling some use cases served 
by the full stats.
+    ///
+    /// See [`ColumnChunkMetaData::page_encoding_stats_mask`] for more 
information.
+    ///
+    /// [`PageEncodingStats`]: crate::file::metadata::PageEncodingStats
+    /// [`ColumnChunkMetaData::page_encoding_stats_mask`]:
+    /// crate::file::metadata::ColumnChunkMetaData::page_encoding_stats_mask
+    pub fn set_encoding_stats_as_mask(&mut self, val: bool) {
+        self.encoding_stats_as_mask = val;
     }
+
+    // with_encoding_stats_as_mask
+    add_mutator!(encoding_stats_as_mask, bool);

Review Comment:
   Now that I've done the macro, it saves all of 3 lines per instance. I'm fine 
with doing away with it.
   
   One reason to keep this concise is that once an API is finalized for this 
setting, I'll add options for chunk statistics, size statistics, and geo 
statistics, and potentially an option to skip decoding the page index 
length/offsets as well (oh, and bloom filters too). That's a lot of repetition 
with high chances of cut-and-paste errors that I'd like to avoid, especially 
with the addition of per-column setters (i.e. the poorly named `set_keep_X`). 
I'd really like the whole suite of accessors/setters to be generated with a 
macro, but then documentation becomes problematic. Suggestions welcome 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to