yordan-pavlov commented on issue #200: URL: https://github.com/apache/arrow-rs/issues/200#issuecomment-848189214
UPDATE: I did some clean up of the code over the weekend, with minor changes to performance. Sadly the new code is still slower in the `read Int32Array, dictionary encoded, mandatory, no NULLs` benchmark. I think what could help would be caching of the value slices in the row group context, but this requires a self-referencing struct and is not an easy thing to do. Also reading dictionary-encoded pages as arrow dictionary arrays could result in further performance improvement and completely eliminate any last remaining performance issues, but that is a topic for another PR. With that said, because the new implementation is faster in all other benchmarks (in some cases more than 5 times faster), and also cleaner, I still think it is a significant improvement and should be merged. So, in the next few days I will be rebasing on the latest arrow-rs/master and creating a PR. Here are the latest benchmarks results: read Int32Array, plain encoded, mandatory, no NULLs - old: time: [8.8516 us 8.9323 us 9.0332 us] read Int32Array, plain encoded, mandatory, no NULLs - new: time: [6.9189 us 7.0340 us 7.1556 us] read Int32Array, plain encoded, optional, no NULLs - old: time: [410.12 us 423.31 us 437.25 us] read Int32Array, plain encoded, optional, no NULLs - new: time: [56.863 us 60.122 us 63.467 us] read Int32Array, plain encoded, optional, half NULLs - old: time: [467.45 us 477.59 us 489.31 us] read Int32Array, plain encoded, optional, half NULLs - new: time: [331.12 us 337.25 us 344.02 us] read Int32Array, dictionary encoded, mandatory, no NULLs - old: time: [43.921 us 44.525 us 45.309 us] read Int32Array, dictionary encoded, mandatory, no NULLs - new: time: [146.66 us 148.39 us 150.40 us] read Int32Array, dictionary encoded, optional, no NULLs - old: time: [304.69 us 310.85 us 317.45 us] read Int32Array, dictionary encoded, optional, no NULLs - new: time: [195.72 us 199.61 us 203.64 us] read Int32Array, dictionary encoded, optional, half NULLs - old: time: [476.98 us 486.63 us 497.43 us] read Int32Array, dictionary encoded, optional, half NULLs - new: time: [401.85 us 408.88 us 416.63 us] read StringArray, plain encoded, mandatory, no NULLs - old: time: [1.6361 ms 1.6459 ms 1.6561 ms] read StringArray, plain encoded, mandatory, no NULLs - new: time: [301.92 us 311.13 us 320.87 us] read StringArray, plain encoded, optional, no NULLs - old: time: [2.0373 ms 2.0751 ms 2.1182 ms] read StringArray, plain encoded, optional, no NULLs - new: time: [343.34 us 350.98 us 359.35 us] read StringArray, plain encoded, optional, half NULLs - old: time: [1.4777 ms 1.4999 ms 1.5247 ms] read StringArray, plain encoded, optional, half NULLs - new: time: [587.13 us 605.02 us 626.96 us] read StringArray, dictionary encoded, mandatory, no NULLs - old: time: [1.4503 ms 1.4681 ms 1.4897 ms] read StringArray, dictionary encoded, mandatory, no NULLs - new: time: [269.72 us 275.22 us 281.45 us] read StringArray, dictionary encoded, optional, no NULLs - old: time: [1.5541 ms 1.5718 ms 1.5911 ms] read StringArray, dictionary encoded, optional, no NULLs - new: time: [325.79 us 336.16 us 348.02 us] read StringArray, dictionary encoded, optional, half NULLs - old: time: [1.3192 ms 1.3395 ms 1.3625 ms] read StringArray, dictionary encoded, optional, half NULLs - new: time: [501.84 us 520.28 us 545.06 us] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
