yordan-pavlov commented on issue #200:
URL: https://github.com/apache/arrow-rs/issues/200#issuecomment-842657902


   UPDATE: over the weekend I implemented a slightly different idea, which 
appears to have unlocked a new level of performance: instead of having an 
iterator (of structs which are essentially references to continuous buffer 
regions), the iterator is just over pages. From then on, value bytes are read 
as byte slices (&[u8]) and passed to a callback function in a converter which 
just copies the byte slice into a MutableBuffer. This minimizes memory 
allocation and memory copy and also results in a significant performance 
improvement for string arrays. 
   
   Also the time for my datafusion benchmark query has reduced further from 
100ms to 70ms (it used to be 125ms before all this work). 
   
   There is still an issue with the "read Int32Array, dictionary encoded, 
mandatory, no NULLs" benchmark, where the new version is still slower, but it 
is now faster than the previous implementation in all other cases (including 
"read Int32Array, plain encoded, mandatory, no NULLs" which used to be slower, 
because the old implementation was already fairly efficient).
   
   Over the next few days I will be looking into a few places in the new code, 
where I think further improvements could be made.
   
   
   Here are the latest benchmark results:
   read Int32Array, plain encoded, mandatory, no NULLs - old: time:   [9.3360 
us 9.4986 us 9.6921 us]
   read Int32Array, plain encoded, mandatory, no NULLs - new: time:   [6.8815 
us 6.9941 us 7.1260 us]
   
   read Int32Array, plain encoded, optional, no NULLs - old: time:   [250.83 us 
254.36 us 258.59 us]
   read Int32Array, plain encoded, optional, no NULLs - new: time:   [49.452 us 
49.547 us 49.686 us]
   
   read Int32Array, plain encoded, optional, half NULLs - old: time:   [448.57 
us 456.15 us 464.68 us]
   read Int32Array, plain encoded, optional, half NULLs - new: time:   [340.68 
us 349.96 us 361.22 us]
   
   read Int32Array, dictionary encoded, mandatory, no NULLs - old: time:   
[44.508 us 45.301 us 46.256 us]
   read Int32Array, dictionary encoded, mandatory, no NULLs - new: time:   
[162.29 us 164.37 us 166.87 us]
   
   read Int32Array, dictionary encoded, optional, no NULLs - old: time:   
[336.00 us 344.43 us 353.51 us]
   read Int32Array, dictionary encoded, optional, no NULLs - new: time:   
[233.54 us 241.86 us 251.34 us]
   
   read Int32Array, dictionary encoded, optional, half NULLs - old: time:   
[458.47 us 468.36 us 481.06 us]
   read Int32Array, dictionary encoded, optional, half NULLs - new: time:   
[464.21 us 470.32 us 477.61 us]
   
   read StringArray, plain encoded, mandatory, no NULLs - old: time:   [1.5856 
ms 1.5996 ms 1.6168 ms]
   read StringArray, plain encoded, mandatory, no NULLs - new: time:   [312.25 
us 314.47 us 317.58 us]
   
   read StringArray, plain encoded, optional, no NULLs - old: time:   [1.7269 
ms 1.7466 ms 1.7679 ms]
   read StringArray, plain encoded, optional, no NULLs - new: time:   [332.59 
us 335.79 us 339.89 us]
   
   read StringArray, plain encoded, optional, half NULLs - old: time:   [1.4635 
ms 1.4821 ms 1.5060 ms]
   read StringArray, plain encoded, optional, half NULLs - new: time:   [533.63 
us 540.17 us 548.34 us]
   
   read StringArray, dictionary encoded, mandatory, no NULLs - old: time:   
[1.4385 ms 1.4566 ms 1.4804 ms]
   read StringArray, dictionary encoded, mandatory, no NULLs - new: time:   
[410.96 us 417.04 us 423.86 us]
   
   read StringArray, dictionary encoded, optional, no NULLs - old: time:   
[1.5751 ms 1.5966 ms 1.6222 ms]
   read StringArray, dictionary encoded, optional, no NULLs - new: time:   
[456.19 us 462.95 us 470.83 us]
   
   read StringArray, dictionary encoded, optional, half NULLs - old: time:   
[1.3197 ms 1.3354 ms 1.3561 ms]
   read StringArray, dictionary encoded, optional, half NULLs - new: time:   
[585.26 us 595.95 us 608.60 us]
   
   
   And here are the latest changes: 
https://github.com/yordan-pavlov/arrow/commit/8f4dcb1b9b0fafb6df612b39231fb585163dd6fb


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to