adriangb commented on code in PR #18160:
URL: https://github.com/apache/datafusion/pull/18160#discussion_r2446678689


##########
datafusion/common/src/config.rs:
##########
@@ -620,7 +620,10 @@ config_namespace! {
         /// bytes of the parquet file optimistically. If not specified, two 
reads are required:
         /// One read to fetch the 8-byte parquet footer and
         /// another to fetch the metadata length encoded in the footer
-        pub metadata_size_hint: Option<usize>, default = None
+        /// Default setting to 512 KB, which should be sufficient for most 
parquet files,
+        /// it can reduce one I/O operation per parquet file. If the metadata 
is larger than
+        /// the hint, two reads will still be performed.
+        pub metadata_size_hint: Option<usize>, default = Some(512 * 1024)

Review Comment:
   FWIW having some prefetch on as a default makes a ton of sense to me. I'd 
like to run the benchmarks to make sure it doesn't have a big impact, I'd guess 
no positive or negative impact since benchmarks run against local disc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to