zhuqi-lucas commented on code in PR #18160:
URL: https://github.com/apache/datafusion/pull/18160#discussion_r2448441297


##########
datafusion/common/src/config.rs:
##########
@@ -620,7 +620,10 @@ config_namespace! {
         /// bytes of the parquet file optimistically. If not specified, two 
reads are required:
         /// One read to fetch the 8-byte parquet footer and
         /// another to fetch the metadata length encoded in the footer
-        pub metadata_size_hint: Option<usize>, default = None
+        /// Default setting to 512 KB, which should be sufficient for most 
parquet files,
+        /// it can reduce one I/O operation per parquet file. If the metadata 
is larger than
+        /// the hint, two reads will still be performed.
+        pub metadata_size_hint: Option<usize>, default = Some(512 * 1024)

Review Comment:
   Thanks, i agree if we have many small files for local.
   
   > will kick off benchmarks.
   > 
   > I think the potential downside of this approach is that it will make 
larger requests to objectstore / local disk by default and use slightly more 
memory for small files (it will always fetch / buffer 512K even if the actual 
footer is much smaller)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to