etseidl commented on code in PR #6630:
URL: https://github.com/apache/arrow-rs/pull/6630#discussion_r1819543026
##########
parquet/src/errors.rs:
##########
@@ -47,6 +47,8 @@ pub enum ParquetError {
IndexOutOfBound(usize, usize),
/// An external error variant
External(Box<dyn Error + Send + Sync>),
+ /// Returned when a function needs more data to complete properly.
Review Comment:
In an earlier POC I did have two separate errors, one with a range, but we
thought that a bit redundant at the time. I suppose we could add an
`Option<Range<usize>>` to `NeedMoreData` to handle the case of insufficient
buffer to read the page indexes. There we know the exact range we need. Right
now the `needed` value is set to the number of bytes read from the tail of the
file. If `try_parse` fails when reading the page indexes, then we really only
need the page index range...there's no need to read the footer all over again.
But then the error handling becomes more complex...if the range in
`NeedMoreData` is `None`, then read the required bytes from the tail and call
`try_parse` again; if range is not `None`, then read the range and call
`read_page_indexes`. Ofc a user is free to ignore the range if they want to
keep the error handling simple at the cost of more I/O (but that would likely
be negligible since those bytes should still be in buffer cache).
For now I'll try to improve the documentation. Let me know if you want me to
add the optional `Range` to `NeedMoreData`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]