Omega359 opened a new issue, #14210: URL: https://github.com/apache/datafusion/issues/14210
### Is your feature request related to a problem or challenge? I came across a [library](https://github.com/ashvardanian/StringZilla) that accelerates some string operations (primarily find for DataFusion's use case) using simd. It seems to be better optimized for larger strings than the memchr::memmem::find that is being used by arrow. For smaller strings on my machine the memchr crate seems to perform better but for longer strings stringzilla seems to outperform it quite significantly. The code changes required to support it are minimal though I would like to need performance comparisons on other architectures (I ran the following on my i9-13900h laptop) ``` # cargo bench --bench contains --bench strpos -- --save-baseline main # <switch to stringzilla branch> # cargo bench --bench contains --bench strpos -- --save-baseline stringzilla # critcmp main stringzilla group main stringzilla ----- ---- ----------- contains_StringArray_ascii_str_len_1024 32.47 16.1±0.95ms ? ?/sec 1.00 496.2±114.11µs ? ?/sec contains_StringArray_ascii_str_len_128 16.00 3.4±0.14ms ? ?/sec 1.00 214.3±71.34µs ? ?/sec contains_StringArray_ascii_str_len_32 1.00 185.0±24.42µs ? ?/sec 1.01 187.2±20.89µs ? ?/sec contains_StringArray_ascii_str_len_4096 15.92 50.1±5.63ms ? ?/sec 1.00 3.1±0.26ms ? ?/sec contains_StringArray_ascii_str_len_8 1.04 97.4±9.77µs ? ?/sec 1.00 94.0±7.62µs ? ?/sec contains_StringArray_utf8_str_len_1024 26.04 24.8±2.85ms ? ?/sec 1.00 951.0±112.52µs ? ?/sec contains_StringArray_utf8_str_len_128 14.07 5.1±0.44ms ? ?/sec 1.00 362.2±85.63µs ? ?/sec contains_StringArray_utf8_str_len_32 1.50 649.9±72.47µs ? ?/sec 1.00 432.1±63.06µs ? ?/sec contains_StringArray_utf8_str_len_4096 16.58 80.3±8.61ms ? ?/sec 1.00 4.8±0.41ms ? ?/sec contains_StringArray_utf8_str_len_8 1.00 236.2±29.07µs ? ?/sec 1.08 254.4±16.59µs ? ?/sec contains_StringViewArray_ascii_str_len_1024 31.25 16.1±0.85ms ? ?/sec 1.00 515.8±99.46µs ? ?/sec contains_StringViewArray_ascii_str_len_128 17.33 3.5±0.36ms ? ?/sec 1.00 204.8±20.87µs ? ?/sec contains_StringViewArray_ascii_str_len_32 1.06 192.6±27.58µs ? ?/sec 1.00 182.2±9.15µs ? ?/sec contains_StringViewArray_ascii_str_len_4096 14.90 49.0±4.56ms ? ?/sec 1.00 3.3±0.42ms ? ?/sec contains_StringViewArray_ascii_str_len_8 1.00 98.0±9.80µs ? ?/sec 1.03 100.7±8.74µs ? ?/sec contains_StringViewArray_utf8_str_len_1024 22.66 24.2±2.08ms ? ?/sec 1.00 1068.3±369.70µs ? ?/sec contains_StringViewArray_utf8_str_len_128 14.90 5.1±0.49ms ? ?/sec 1.00 343.2±32.31µs ? ?/sec contains_StringViewArray_utf8_str_len_32 1.49 623.7±43.61µs ? ?/sec 1.00 417.7±31.81µs ? ?/sec contains_StringViewArray_utf8_str_len_4096 15.93 79.7±8.05ms ? ?/sec 1.00 5.0±0.63ms ? ?/sec contains_StringViewArray_utf8_str_len_8 1.00 246.2±52.73µs ? ?/sec 1.13 277.5±31.61µs ? ?/sec strpos_StringArray_ascii_str_len_1024 9.96 9.8±1.15ms ? ?/sec 1.00 986.1±116.33µs ? ?/sec strpos_StringArray_ascii_str_len_128 3.39 1335.0±145.51µs ? ?/sec 1.00 393.3±40.23µs ? ?/sec strpos_StringArray_ascii_str_len_32 1.06 420.7±92.47µs ? ?/sec 1.00 396.0±57.32µs ? ?/sec strpos_StringArray_ascii_str_len_4096 5.85 40.6±3.68ms ? ?/sec 1.00 6.9±0.84ms ? ?/sec strpos_StringArray_ascii_str_len_8 1.07 150.4±12.31µs ? ?/sec 1.00 141.0±10.02µs ? ?/sec strpos_StringArray_utf8_str_len_1024 20.27 26.7±2.58ms ? ?/sec 1.00 1316.5±127.39µs ? ?/sec strpos_StringArray_utf8_str_len_128 6.71 3.9±0.27ms ? ?/sec 1.00 586.3±88.39µs ? ?/sec strpos_StringArray_utf8_str_len_32 2.45 1653.1±158.90µs ? ?/sec 1.00 673.5±62.95µs ? ?/sec strpos_StringArray_utf8_str_len_4096 14.90 101.7±4.95ms ? ?/sec 1.00 6.8±0.79ms ? ?/sec strpos_StringArray_utf8_str_len_8 1.82 713.9±45.84µs ? ?/sec 1.00 392.3±44.63µs ? ?/sec strpos_StringViewArray_ascii_str_len_1024 7.73 9.0±1.26ms ? ?/sec 1.00 1167.2±114.81µs ? ?/sec strpos_StringViewArray_ascii_str_len_128 2.59 1240.3±130.47µs ? ?/sec 1.00 478.0±48.19µs ? ?/sec strpos_StringViewArray_ascii_str_len_32 1.00 395.4±44.10µs ? ?/sec 1.16 457.0±47.80µs ? ?/sec strpos_StringViewArray_ascii_str_len_4096 4.42 36.2±2.29ms ? ?/sec 1.00 8.2±1.16ms ? ?/sec strpos_StringViewArray_ascii_str_len_8 1.00 192.9±14.14µs ? ?/sec 1.06 204.2±18.46µs ? ?/sec strpos_StringViewArray_utf8_str_len_1024 17.67 27.0±2.46ms ? ?/sec 1.00 1526.3±205.96µs ? ?/sec strpos_StringViewArray_utf8_str_len_128 6.85 4.1±0.40ms ? ?/sec 1.00 593.2±51.54µs ? ?/sec strpos_StringViewArray_utf8_str_len_32 2.49 1697.1±319.77µs ? ?/sec 1.00 680.8±50.25µs ? ?/sec strpos_StringViewArray_utf8_str_len_4096 13.95 103.3±7.00ms ? ?/sec 1.00 7.4±0.91ms ? ?/sec strpos_StringViewArray_utf8_str_len_8 1.93 776.5±127.84µs ? ?/sec 1.00 401.6±78.46µs ? ?/sec ``` ### Describe the solution you'd like Incorporate the stringzilla into df functions where appropriate. ### Describe alternatives you've considered Leave the code as is. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org