ShiKaiWi commented on issue #5250: URL: https://github.com/apache/arrow-rs/issues/5250#issuecomment-1871134531
After run the command `cargo bench --bench arrow_reader --features="arrow test_common experimental" -- StringArray/dictionary` in the `parquet` source directory, I find that the mentioned problem has been fixed on the master branch, with the changeset https://github.com/ShiKaiWi/arrow-rs/commit/d4e905a6cc337f10c61f47d75f264df82fc97242, the performance drops: ``` arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs time: [152.71 µs 153.63 µs 154.51 µs] change: [+9.9536% +11.737% +13.274%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs time: [146.80 µs 147.41 µs 148.10 µs] change: [+14.007% +14.516% +14.970%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs time: [174.27 µs 176.84 µs 179.28 µs] change: [+5.0935% +6.4985% +7.8113%] (p = 0.00 < 0.05) Performance has regressed. ``` And the changeset https://github.com/ShiKaiWi/arrow-rs/commit/d4e905a6cc337f10c61f47d75f264df82fc97242 only works for the `parquet v43`. @tustvold sorry to bother you. :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
