tustvold opened a new pull request #1054:
URL: https://github.com/apache/arrow-rs/pull/1054
# Which issue does this PR close?
Highly experimental, builds on #1039 #1052 #1041
Closes #1037
# Rationale for this change
See ticket.
This leads to anything from a 2-6x performance improvement when decoding
columns containing nulls. As is to be expected the biggest savings are where
the other decode overheads are less - with the 6x return on "Int32Array, plain
encoded, optional, half NULLs - old "
_There is some funkiness with the benchmarks and the memory allocator on my
local machine, with it "faster" to preallocate a single 64 byte array first
before trying to read data._
# What changes are included in this PR?
This changes RecordReader to use a new `DefinitionLevelBuffer` that has a
corresponding `DefinitionLevelDecoder` that can read directly from parquet.
Skipping intermediate buffering, and avoiding decoding packet bitmasks where
not necessary
# Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]