klion26 commented on PR #7987:
URL: https://github.com/apache/arrow-rs/pull/7987#issuecomment-3111877503

   @alamb @scovich @viirya, please help review this when you're free, thanks.
   
   I've created benchmarks for various implementations. The current 
implementation is the winner, and the alternatives are
   
   1. Current implementation with `PackedU32Iterator`
   2. Splice with `Iterator` ([code 
here](https://github.com/klion26/arrow-rs/commit/1d594b3b4a461a44ad72ecac730cbdfc537767d4#diff-19c7b0b0d73ef11489af7932f49046a19ec7790896a8960add5a3ded21d5657aR1220))
   3. Collect the header with iterator before splice([code 
here](https://github.com/klion26/arrow-rs/blob/7179a56258429d8431273d525ced836dd706e3e4/parquet-variant/src/builder.rs#L1238))
   4. Splice with `[1u8, header_size]` followed by fill the header ([code 
here](https://github.com/klion26/arrow-rs/blob/9cc1b04a007c54274db81059d317747b2512e169/parquet-variant/src/builder.rs#L1282))
   
   
   The benchmark comparison result from my laptop
   
   ## 1 PackedU32 Iterator
   ```
   group                                                                
7977_packedu32_iterator                main
   -----                                                                
-----------------------                ----
   batch_json_string_to_variant json_list 8k string                     1.00    
 41.7±5.53ms        ? ?/sec    1.22     51.0±7.14ms        ? ?/sec
   batch_json_string_to_variant random_json(2633 bytes per document)    1.00   
414.0±41.45ms        ? ?/sec    1.11   458.7±48.08ms        ? ?/sec
   batch_json_string_to_variant repeated_struct 8k string               1.00    
 15.7±2.04ms        ? ?/sec    1.01     15.9±1.67ms        ? ?/sec
   variant_get_primitive                                                1.09    
  2.7±0.34ms        ? ?/sec    1.00      2.5±0.28ms        ? ?/sec
   ```
   
   ## 2  Splice with `Iterator`
   ```
   group                                                                
7977_avoid_allocation_for_list_builder    main
   -----                                                                
--------------------------------------    ----
   batch_json_string_to_variant json_list 8k string                     1.00    
 46.7±6.23ms        ? ?/sec       1.09     51.0±7.14ms        ? ?/sec
   batch_json_string_to_variant random_json(2633 bytes per document)    1.00   
418.0±42.38ms        ? ?/sec       1.10   458.7±48.08ms        ? ?/sec
   batch_json_string_to_variant repeated_struct 8k string               1.00    
 15.9±1.97ms        ? ?/sec       1.00     15.9±1.67ms        ? ?/sec
   variant_get_primitive                                                1.01    
  2.5±0.28ms        ? ?/sec       1.00      2.5±0.28ms        ? ?/sec
   ```
   
   ##  3 Collect the header with iterator before splice
   ```
   group                                                                
7977_collect_before_splice             main
   -----                                                                
--------------------------             ----
   batch_json_string_to_variant json_list 8k string                     1.00    
 46.4±4.60ms        ? ?/sec    1.10     51.0±7.14ms        ? ?/sec
   batch_json_string_to_variant random_json(2633 bytes per document)    1.00   
424.5±43.27ms        ? ?/sec    1.08   458.7±48.08ms        ? ?/sec
   batch_json_string_to_variant repeated_struct 8k string               1.00    
 15.9±1.83ms        ? ?/sec    1.00     15.9±1.67ms        ? ?/sec
   variant_get_primitive                                                1.02    
  2.5±0.31ms        ? ?/sec    1.00      2.5±0.28ms        ? ?/sec
   ```
   
   ## 4 Splice with `[1u8,header_size]` followed by fill the header
   
   ```
   group                                                                
7977_fill_before_splice                main
   -----                                                                
-----------------------                ----
   batch_json_string_to_variant json_list 8k string                     1.00    
 45.1±2.68ms        ? ?/sec    1.13     51.0±7.14ms        ? ?/sec
   batch_json_string_to_variant random_json(2633 bytes per document)    1.00   
419.6±40.92ms        ? ?/sec    1.09   458.7±48.08ms        ? ?/sec
   batch_json_string_to_variant repeated_struct 8k string               1.04    
 16.5±1.20ms        ? ?/sec    1.00     15.9±1.67ms        ? ?/sec
   variant_get_primitive                                                1.12    
  2.8±0.26ms        ? ?/sec    1.00      2.5±0.28ms        ? ?/sec
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to