etseidl commented on code in PR #6367:
URL: https://github.com/apache/arrow-rs/pull/6367#discussion_r1759938364


##########
parquet/src/arrow/async_reader/metadata.rs:
##########
@@ -57,6 +57,12 @@ impl<F: MetadataFetch> MetadataLoader<F> {
             return Err(ParquetError::EOF(format!(
                 "file size of {file_size} is less than footer"
             )));
+        } else if let Some(size_hint) = prefetch {
+            if size_hint < 8 {
+                return Err(ParquetError::EOF(format!(
+                    "prefetch size of {size_hint} is less than footer size"
+                )));
+            }

Review Comment:
   I'm thinking along the lines of
   ```rust
           // If a size hint is provided, read more than the minimum size
           // to try and avoid a second fetch.
           let footer_start = if let Some(size_hint) = prefetch {
               // check for hint smaller than footer
               let size_hint = std::cmp::max(size_hint, FOOTER_SIZE);
               // check for hint larger than the file
               let size_hint = std::cmp::min(size_hint, file_size);
               file_size.saturating_sub(size_hint)
           } else {
               file_size - FOOTER_SIZE
           };
   ```
   This guards against the error you're seeing, but allows providing a hint 
larger than the file without triggering extra I/O (as would happen if the hint 
were simply set to `None`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to