klion26 commented on PR #7987: URL: https://github.com/apache/arrow-rs/pull/7987#issuecomment-3111877503
@alamb @scovich @viirya, please help review this when you're free, thanks. I've created benchmarks for various implementations. The current implementation is the winner, and the alternatives are 1. Current implementation with `PackedU32Iterator` 2. Splice with `Iterator` ([code here](https://github.com/klion26/arrow-rs/commit/1d594b3b4a461a44ad72ecac730cbdfc537767d4#diff-19c7b0b0d73ef11489af7932f49046a19ec7790896a8960add5a3ded21d5657aR1220)) 3. Collect the header with iterator before splice([code here](https://github.com/klion26/arrow-rs/blob/7179a56258429d8431273d525ced836dd706e3e4/parquet-variant/src/builder.rs#L1238)) 4. Splice with `[1u8, header_size]` followed by fill the header ([code here](https://github.com/klion26/arrow-rs/blob/9cc1b04a007c54274db81059d317747b2512e169/parquet-variant/src/builder.rs#L1282)) The benchmark comparison result from my laptop ## 1 PackedU32 Iterator ``` group 7977_packedu32_iterator main ----- ----------------------- ---- batch_json_string_to_variant json_list 8k string 1.00 41.7±5.53ms ? ?/sec 1.22 51.0±7.14ms ? ?/sec batch_json_string_to_variant random_json(2633 bytes per document) 1.00 414.0±41.45ms ? ?/sec 1.11 458.7±48.08ms ? ?/sec batch_json_string_to_variant repeated_struct 8k string 1.00 15.7±2.04ms ? ?/sec 1.01 15.9±1.67ms ? ?/sec variant_get_primitive 1.09 2.7±0.34ms ? ?/sec 1.00 2.5±0.28ms ? ?/sec ``` ## 2 Splice with `Iterator` ``` group 7977_avoid_allocation_for_list_builder main ----- -------------------------------------- ---- batch_json_string_to_variant json_list 8k string 1.00 46.7±6.23ms ? ?/sec 1.09 51.0±7.14ms ? ?/sec batch_json_string_to_variant random_json(2633 bytes per document) 1.00 418.0±42.38ms ? ?/sec 1.10 458.7±48.08ms ? ?/sec batch_json_string_to_variant repeated_struct 8k string 1.00 15.9±1.97ms ? ?/sec 1.00 15.9±1.67ms ? ?/sec variant_get_primitive 1.01 2.5±0.28ms ? ?/sec 1.00 2.5±0.28ms ? ?/sec ``` ## 3 Collect the header with iterator before splice ``` group 7977_collect_before_splice main ----- -------------------------- ---- batch_json_string_to_variant json_list 8k string 1.00 46.4±4.60ms ? ?/sec 1.10 51.0±7.14ms ? ?/sec batch_json_string_to_variant random_json(2633 bytes per document) 1.00 424.5±43.27ms ? ?/sec 1.08 458.7±48.08ms ? ?/sec batch_json_string_to_variant repeated_struct 8k string 1.00 15.9±1.83ms ? ?/sec 1.00 15.9±1.67ms ? ?/sec variant_get_primitive 1.02 2.5±0.31ms ? ?/sec 1.00 2.5±0.28ms ? ?/sec ``` ## 4 Splice with `[1u8,header_size]` followed by fill the header ``` group 7977_fill_before_splice main ----- ----------------------- ---- batch_json_string_to_variant json_list 8k string 1.00 45.1±2.68ms ? ?/sec 1.13 51.0±7.14ms ? ?/sec batch_json_string_to_variant random_json(2633 bytes per document) 1.00 419.6±40.92ms ? ?/sec 1.09 458.7±48.08ms ? ?/sec batch_json_string_to_variant repeated_struct 8k string 1.04 16.5±1.20ms ? ?/sec 1.00 15.9±1.67ms ? ?/sec variant_get_primitive 1.12 2.8±0.26ms ? ?/sec 1.00 2.5±0.28ms ? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
