Omega359 opened a new issue, #14210:
URL: https://github.com/apache/datafusion/issues/14210

   ### Is your feature request related to a problem or challenge?
   
   I came across a [library](https://github.com/ashvardanian/StringZilla) that 
accelerates some string operations (primarily find for DataFusion's use case) 
using simd. It seems to be better optimized for larger strings than the 
memchr::memmem::find that is being used by arrow. For smaller strings on my 
machine the memchr crate seems to perform better but for longer strings 
stringzilla seems to outperform it quite significantly. The code changes 
required to support it are minimal though I would like to need performance 
comparisons on other architectures (I ran the following on my i9-13900h laptop)
   
   ```
   # cargo bench --bench contains --bench strpos -- --save-baseline main
   # <switch to stringzilla branch>
   # cargo bench --bench contains --bench strpos -- --save-baseline stringzilla
   # critcmp main stringzilla
   group                                          main                          
          stringzilla
   -----                                          ----                          
          -----------
   contains_StringArray_ascii_str_len_1024        32.47    16.1±0.95ms        ? 
?/sec     1.00  496.2±114.11µs        ? ?/sec
   contains_StringArray_ascii_str_len_128         16.00     3.4±0.14ms        ? 
?/sec     1.00   214.3±71.34µs        ? ?/sec
   contains_StringArray_ascii_str_len_32          1.00   185.0±24.42µs        ? 
?/sec     1.01   187.2±20.89µs        ? ?/sec
   contains_StringArray_ascii_str_len_4096        15.92    50.1±5.63ms        ? 
?/sec     1.00      3.1±0.26ms        ? ?/sec
   contains_StringArray_ascii_str_len_8           1.04     97.4±9.77µs        ? 
?/sec     1.00     94.0±7.62µs        ? ?/sec
   contains_StringArray_utf8_str_len_1024         26.04    24.8±2.85ms        ? 
?/sec     1.00  951.0±112.52µs        ? ?/sec
   contains_StringArray_utf8_str_len_128          14.07     5.1±0.44ms        ? 
?/sec     1.00   362.2±85.63µs        ? ?/sec
   contains_StringArray_utf8_str_len_32           1.50   649.9±72.47µs        ? 
?/sec     1.00   432.1±63.06µs        ? ?/sec
   contains_StringArray_utf8_str_len_4096         16.58    80.3±8.61ms        ? 
?/sec     1.00      4.8±0.41ms        ? ?/sec
   contains_StringArray_utf8_str_len_8            1.00   236.2±29.07µs        ? 
?/sec     1.08   254.4±16.59µs        ? ?/sec
   contains_StringViewArray_ascii_str_len_1024    31.25    16.1±0.85ms        ? 
?/sec     1.00   515.8±99.46µs        ? ?/sec
   contains_StringViewArray_ascii_str_len_128     17.33     3.5±0.36ms        ? 
?/sec     1.00   204.8±20.87µs        ? ?/sec
   contains_StringViewArray_ascii_str_len_32      1.06   192.6±27.58µs        ? 
?/sec     1.00    182.2±9.15µs        ? ?/sec
   contains_StringViewArray_ascii_str_len_4096    14.90    49.0±4.56ms        ? 
?/sec     1.00      3.3±0.42ms        ? ?/sec
   contains_StringViewArray_ascii_str_len_8       1.00     98.0±9.80µs        ? 
?/sec     1.03    100.7±8.74µs        ? ?/sec
   contains_StringViewArray_utf8_str_len_1024     22.66    24.2±2.08ms        ? 
?/sec     1.00  1068.3±369.70µs        ? ?/sec
   contains_StringViewArray_utf8_str_len_128      14.90     5.1±0.49ms        ? 
?/sec     1.00   343.2±32.31µs        ? ?/sec
   contains_StringViewArray_utf8_str_len_32       1.49   623.7±43.61µs        ? 
?/sec     1.00   417.7±31.81µs        ? ?/sec
   contains_StringViewArray_utf8_str_len_4096     15.93    79.7±8.05ms        ? 
?/sec     1.00      5.0±0.63ms        ? ?/sec
   contains_StringViewArray_utf8_str_len_8        1.00   246.2±52.73µs        ? 
?/sec     1.13   277.5±31.61µs        ? ?/sec
   strpos_StringArray_ascii_str_len_1024          9.96      9.8±1.15ms        ? 
?/sec     1.00  986.1±116.33µs        ? ?/sec
   strpos_StringArray_ascii_str_len_128           3.39  1335.0±145.51µs        
? ?/sec    1.00   393.3±40.23µs        ? ?/sec
   strpos_StringArray_ascii_str_len_32            1.06   420.7±92.47µs        ? 
?/sec     1.00   396.0±57.32µs        ? ?/sec
   strpos_StringArray_ascii_str_len_4096          5.85     40.6±3.68ms        ? 
?/sec     1.00      6.9±0.84ms        ? ?/sec
   strpos_StringArray_ascii_str_len_8             1.07   150.4±12.31µs        ? 
?/sec     1.00   141.0±10.02µs        ? ?/sec
   strpos_StringArray_utf8_str_len_1024           20.27    26.7±2.58ms        ? 
?/sec     1.00  1316.5±127.39µs        ? ?/sec
   strpos_StringArray_utf8_str_len_128            6.71      3.9±0.27ms        ? 
?/sec     1.00   586.3±88.39µs        ? ?/sec
   strpos_StringArray_utf8_str_len_32             2.45  1653.1±158.90µs        
? ?/sec    1.00   673.5±62.95µs        ? ?/sec
   strpos_StringArray_utf8_str_len_4096           14.90   101.7±4.95ms        ? 
?/sec     1.00      6.8±0.79ms        ? ?/sec
   strpos_StringArray_utf8_str_len_8              1.82   713.9±45.84µs        ? 
?/sec     1.00   392.3±44.63µs        ? ?/sec
   strpos_StringViewArray_ascii_str_len_1024      7.73      9.0±1.26ms        ? 
?/sec     1.00  1167.2±114.81µs        ? ?/sec
   strpos_StringViewArray_ascii_str_len_128       2.59  1240.3±130.47µs        
? ?/sec    1.00   478.0±48.19µs        ? ?/sec
   strpos_StringViewArray_ascii_str_len_32        1.00   395.4±44.10µs        ? 
?/sec     1.16   457.0±47.80µs        ? ?/sec
   strpos_StringViewArray_ascii_str_len_4096      4.42     36.2±2.29ms        ? 
?/sec     1.00      8.2±1.16ms        ? ?/sec
   strpos_StringViewArray_ascii_str_len_8         1.00   192.9±14.14µs        ? 
?/sec     1.06   204.2±18.46µs        ? ?/sec
   strpos_StringViewArray_utf8_str_len_1024       17.67    27.0±2.46ms        ? 
?/sec     1.00  1526.3±205.96µs        ? ?/sec
   strpos_StringViewArray_utf8_str_len_128        6.85      4.1±0.40ms        ? 
?/sec     1.00   593.2±51.54µs        ? ?/sec
   strpos_StringViewArray_utf8_str_len_32         2.49  1697.1±319.77µs        
? ?/sec    1.00   680.8±50.25µs        ? ?/sec
   strpos_StringViewArray_utf8_str_len_4096       13.95   103.3±7.00ms        ? 
?/sec     1.00      7.4±0.91ms        ? ?/sec
   strpos_StringViewArray_utf8_str_len_8          1.93  776.5±127.84µs        ? 
?/sec     1.00   401.6±78.46µs        ? ?/sec
   ```
   
   
   ### Describe the solution you'd like
   
   Incorporate the stringzilla into df functions where appropriate.
   
   ### Describe alternatives you've considered
   
   Leave the code as is.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to