[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2478: Add ChunkReader::get_bytes

GitBox Wed, 17 Aug 2022 04:23:01 -0700


tustvold commented on code in PR #2478:
URL: https://github.com/apache/arrow-rs/pull/2478#discussion_r947808290



##########
parquet/src/file/serialized_reader.rs:
##########
@@ -623,26 +627,13 @@ impl<R: ChunkReader> PageReader for 
SerializedPageReader<R> {
 
                     let page_len = front.compressed_page_size as usize;
 
-                    // TODO: Add ChunkReader get_bytes to potentially avoid 
copy
-                    let mut buffer = Vec::with_capacity(page_len);
-                    let read = self
-                        .reader
-                        .get_read(front.offset as u64, page_len)?
-                        .read_to_end(&mut buffer)?;
-
-                    if read != page_len {
-                        return Err(eof_err!(
-                            "Expected to read {} bytes of page, read only {}",
-                            page_len,
-                            read
-                        ));
-                    }
+                    let buffer = self.reader.get_bytes(front.offset as u64, 
page_len)?;

Review Comment:
   We can only do this when we have an offset index, as we need to know the 
size of the page to read. There is a question over whether we could just 
eagerly fetch the entire column chunk in the latter case, this needs some 
investigation. It would drastically simplify a lot of the code (it would 
eliminate FileSource)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2478: Add ChunkReader::get_bytes

Reply via email to