etseidl commented on code in PR #6630:
URL: https://github.com/apache/arrow-rs/pull/6630#discussion_r1819543026


##########
parquet/src/errors.rs:
##########
@@ -47,6 +47,8 @@ pub enum ParquetError {
     IndexOutOfBound(usize, usize),
     /// An external error variant
     External(Box<dyn Error + Send + Sync>),
+    /// Returned when a function needs more data to complete properly.

Review Comment:
   In an earlier POC I did have two separate errors, one with a range, but we 
thought that a bit redundant at the time. I suppose we could add an 
`Option<Range<usize>>` to `NeedMoreData` to handle the case of insufficient 
buffer to read the page indexes. There we know the exact range we need. Right 
now the `needed` value is set to the number of bytes read from the tail of the 
file. If `try_parse` fails when reading the page indexes, then we really only 
need the page index range...there's no need to read the footer all over again. 
But then the error handling becomes more complex...if the range in 
`NeedMoreData` is `None`, then read the required bytes from the tail and call 
`try_parse` again; if range is not `None`, then read the range and call 
`read_page_indexes`. Ofc a user is free to ignore the range if they want to 
keep the error handling simple at the cost of more I/O (but that would likely 
be negligible since those bytes should still be in buffer cache).
   
   For now I'll try to improve the documentation. Let me know if you want me to 
add the optional `Range` to `NeedMoreData`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to