tustvold commented on code in PR #2478:
URL: https://github.com/apache/arrow-rs/pull/2478#discussion_r947808290
##########
parquet/src/file/serialized_reader.rs:
##########
@@ -623,26 +627,13 @@ impl<R: ChunkReader> PageReader for
SerializedPageReader<R> {
let page_len = front.compressed_page_size as usize;
- // TODO: Add ChunkReader get_bytes to potentially avoid
copy
- let mut buffer = Vec::with_capacity(page_len);
- let read = self
- .reader
- .get_read(front.offset as u64, page_len)?
- .read_to_end(&mut buffer)?;
-
- if read != page_len {
- return Err(eof_err!(
- "Expected to read {} bytes of page, read only {}",
- page_len,
- read
- ));
- }
+ let buffer = self.reader.get_bytes(front.offset as u64,
page_len)?;
Review Comment:
We can only do this when we have an offset index, as we need to know the
size of the page to read. There is a question over whether we could just
eagerly fetch the entire column chunk in the latter case, this needs some
investigation. It would drastically simplify a lot of the code (it would
eliminate FileSource)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]