This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git


The following commit(s) were added to refs/heads/main by this push:
     new bac0cb57af Small optimization in Parquet varint decoder (#8742)
bac0cb57af is described below

commit bac0cb57af36f0c025696db146eccea8f3f469cb
Author: Ed Seidl <[email protected]>
AuthorDate: Fri Oct 31 06:49:41 2025 -0700

    Small optimization in Parquet varint decoder (#8742)
    
    # Which issue does this PR close?
    
    - Part of #5853.
    
    # Rationale for this change
    
    Following the recent improvements in Thrift decoding, the percentage of
    time spent decoding LEB128 encoded integers has increased.
    
    # What changes are included in this PR?
    
    This PR modifies the varint decoder to first test for integers that can
    be encoded in a single byte (using zig-zag encoding, the maximum int
    that can be encoded is 63). Many of the fields in the Parquet footer
    (including all enum values) will be in this range, so optimizing for
    this frequent occurrence makes sense.
    
    # Are these changes tested?
    
    Should be covered by existing tests
    
    # Are there any user-facing changes?
    
    No
---
 parquet/src/parquet_thrift.rs | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/parquet/src/parquet_thrift.rs b/parquet/src/parquet_thrift.rs
index 8ee018ef95..f9fa66ee0d 100644
--- a/parquet/src/parquet_thrift.rs
+++ b/parquet/src/parquet_thrift.rs
@@ -276,8 +276,13 @@ pub(crate) trait ThriftCompactInputProtocol<'a> {
 
     /// Read a ULEB128 encoded unsigned varint from the input.
     fn read_vlq(&mut self) -> ThriftProtocolResult<u64> {
-        let mut in_progress = 0;
-        let mut shift = 0;
+        // try the happy path first
+        let byte = self.read_byte()?;
+        if byte & 0x80 == 0 {
+            return Ok(byte as u64);
+        }
+        let mut in_progress = (byte & 0x7f) as u64;
+        let mut shift = 7;
         loop {
             let byte = self.read_byte()?;
             in_progress |= ((byte & 0x7F) as u64).wrapping_shl(shift);

Reply via email to