This is an automated email from the ASF dual-hosted git repository.

dheres pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/main by this push:
     new ba3446bb90 [Parquet] perf: reuse seeked File clone in 
ChunkReader::get_read() (#9214)
ba3446bb90 is described below

commit ba3446bb90cc652a45e909f1caf6a64c39a57609
Author: Florian Valeye <[email protected]>
AuthorDate: Sun Jan 18 21:09:13 2026 +0100

    [Parquet] perf: reuse seeked File clone in ChunkReader::get_read() (#9214)
    
    # Which issue does this PR close?
    N/A, it's a minor performance fix.
    
    # Rationale for this change
    While reviewing Parquet performance, I observed a duplicate
    `try_clone()`. I wasn't able to tell why it was required. After
    benchmarking and running tests, it seems there is no reason for the
    duplication.
    `ChunkReader::get_read()` for `File` calls
    
[`try_clone()`](https://doc.rust-lang.org/std/fs/struct.File.html#method.try_clone)
    twice: once to seek, then again for the `BufReader`, discarding the
    first clone. This might be wasteful, as each `try_clone()` duplicates
    the file descriptor via a system call. So, one less dup() syscall per
    get_read() call.
    
    # What changes are included in this PR?
    Reuse the already-seeked file clone instead of creating a new one.
    
    # Are these changes tested?
    Covered by existing tests.
    Local benchmarks using [divan](https://github.com/nvzqz/divan) show ~36%
    improvement for `get_read()` calls on my laptop.
    
    # Are there any user-facing changes?
     No.
---
 parquet/src/file/reader.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/parquet/src/file/reader.rs b/parquet/src/file/reader.rs
index 3adf10fac2..2b3c46f507 100644
--- a/parquet/src/file/reader.rs
+++ b/parquet/src/file/reader.rs
@@ -93,7 +93,7 @@ impl ChunkReader for File {
     fn get_read(&self, start: u64) -> Result<Self::T> {
         let mut reader = self.try_clone()?;
         reader.seek(SeekFrom::Start(start))?;
-        Ok(BufReader::new(self.try_clone()?))
+        Ok(BufReader::new(reader))
     }
 
     fn get_bytes(&self, start: u64, length: usize) -> Result<Bytes> {

Reply via email to