etseidl commented on PR #6281: URL: https://github.com/apache/arrow-rs/pull/6281#issuecomment-2302741114
I agree that results from the compression bench are needed. Running your test code in release mode speeds things up dramatically. ``` Debug init with macro, num_elems:10000, cost:7.771µs init with resize, num_elems:20000, cost:130.553µs init with set_len, num_elems:30000, cost:7.661µs Release init with macro, num_elems:10000, cost:100ns init with resize, num_elems:20000, cost:7.49µs init with set_len, num_elems:30000, cost:80ns ``` `resize` is still much slower, but is 20X faster with optimization enabled. I'd be curious to see how overall throughput through the decompressor is affected by this change. FWIW I tried replacing another [use](https://github.com/apache/arrow-rs/blob/30db5dce9ca0996457063f1b5308076a6372c438/parquet/src/arrow/array_reader/fixed_len_byte_array.rs#L467) of `resize` with the `reserve/write_bytes/set_len` proposed here and saw a modest (2-4%) speedup in decoding times. ``` group new_resize to_prim ----- ---------- ------- arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs 1.00 411.7±2.05µs ? ?/sec 1.04 429.1±7.35µs ? ?/sec arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, half NULLs 1.00 523.3±2.65µs ? ?/sec 1.02 532.7±4.72µs ? ?/sec arrow_array_reader/BYTE_STREAM_SPLIT/Decimal128Array/byte_stream_split encoded, optional, no NULLs 1.00 414.0±3.41µs ? ?/sec 1.04 428.6±6.21µs ? ?/sec arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, mandatory, no NULLs 1.00 52.1±0.26µs ? ?/sec 1.04 54.1±0.78µs ? ?/sec arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, half NULLs 1.00 109.8±0.65µs ? ?/sec 1.04 113.7±1.61µs ? ?/sec arrow_array_reader/BYTE_STREAM_SPLIT/Float16Array/byte_stream_split encoded, optional, no NULLs 1.00 56.7±1.02µs ? ?/sec 1.04 59.0±0.52µs ? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
