[GitHub] [arrow-rs] tustvold opened a new pull request #1054: Preserve Parquet Bitmask (#1037)

GitBox Fri, 17 Dec 2021 09:28:08 -0800


tustvold opened a new pull request #1054:
URL: https://github.com/apache/arrow-rs/pull/1054



   # Which issue does this PR close?
   
   Highly experimental, builds on #1039 #1052 #1041 
   
   Closes #1037 
   
   # Rationale for this change
    
   See ticket.
   
   This leads to anything from a 2-6x performance improvement when decoding 
columns containing nulls. As is to be expected the biggest savings are where 
the other decode overheads are less - with the 6x return on "Int32Array, plain 
encoded, optional, half NULLs - old "
   
   _There is some funkiness with the benchmarks and the memory allocator on my 
local machine, with it "faster" to preallocate a single 64 byte array first 
before trying to read data._ 
   
   # What changes are included in this PR?
   
   This changes RecordReader to use a new `DefinitionLevelBuffer` that has a 
corresponding `DefinitionLevelDecoder` that can read directly from parquet. 
Skipping intermediate buffering, and avoiding decoding packet bitmasks where 
not necessary
   
   # Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold opened a new pull request #1054: Preserve Parquet Bitmask (#1037)

Reply via email to