tustvold opened a new pull request, #1861:
URL: https://github.com/apache/arrow-rs/pull/1861

   _Draft as builds on #1860_
   
   # Which issue does this PR close?
   
   Closes #1851
   Relates to #1843
   
   # Rationale for this change
    
   StringDictionaryBuilder can be made significantly faster
   
   # What changes are included in this PR?
   
   There are two major changes in this PR
   
   * Switch to ahash
   * Avoid caching string keys in HashMap
   
   The first is ~40% uplift regardless of data shape, the latter adds a further 
performance improvement ranging from ~10-20% depending on the dictionary size.
   
   ```
   string_dictionary_builder/(dict_size:20, len:1000, key_len: 5)               
                                                              
                           time:   [15.148 us 15.179 us 15.213 us]
                           change: [-49.937% -49.607% -49.183%] (p = 0.00 < 
0.05)
                           Performance has improved.
   string_dictionary_builder/(dict_size:100, len:1000, key_len: 5)              
                                                               
                           time:   [15.334 us 15.372 us 15.408 us]
                           change: [-60.780% -60.676% -60.577%] (p = 0.00 < 
0.05)
                           Performance has improved.
   string_dictionary_builder/(dict_size:100, len:1000, key_len: 10)             
                                                                
                           time:   [14.638 us 14.653 us 14.668 us]
                           change: [-66.763% -66.716% -66.673%] (p = 0.00 < 
0.05)
                           Performance has improved.
   string_dictionary_builder/(dict_size:100, len:10000, key_len: 10)            
                                                                
                           time:   [131.08 us 131.15 us 131.23 us]
                           change: [-61.008% -60.966% -60.922%] (p = 0.00 < 
0.05)
                           Performance has improved.
   string_dictionary_builder/(dict_size:100, len:10000, key_len: 100)           
                                                                 
                           time:   [379.73 us 379.89 us 380.06 us]
                           change: [-61.999% -61.946% -61.887%] (p = 0.00 < 
0.05)
                           Performance has improved.
   ```
   
   # Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to