neilconway commented on PR #20385:
URL: https://github.com/apache/datafusion/pull/20385#issuecomment-3945356897
@Jefffrey Got it; the default `hashbrown` hash function does seem like a
better choice. Interestingly the benchmarks are significantly better in some
cases. This is comparing the feature branch with hashbrown (target) vs std
hashset (base):
```
group base
target
----- ----
------
array_has_any/no_match/10 1.00 7.4±0.08ms
? ?/sec 1.00 7.4±0.04ms ? ?/sec
array_has_any/no_match/100 1.00 23.0±0.05ms
? ?/sec 1.02 23.6±0.05ms ? ?/sec
array_has_any/no_match/500 1.00 91.7±0.11ms
? ?/sec 1.05 96.2±0.53ms ? ?/sec
array_has_any/scalar_no_match/10 1.00 2.1±0.02ms
? ?/sec 1.00 2.1±0.00ms ? ?/sec
array_has_any/scalar_no_match/100 1.00 20.3±0.07ms
? ?/sec 1.01 20.5±0.06ms ? ?/sec
array_has_any/scalar_no_match/500 1.00 134.4±1.10ms
? ?/sec 1.00 134.6±0.37ms ? ?/sec
array_has_any/scalar_some_match/10 1.00 1038.6±12.05µs
? ?/sec 1.00 1036.2±4.87µs ? ?/sec
array_has_any/scalar_some_match/100 1.00 10.7±0.09ms
? ?/sec 1.00 10.7±0.07ms ? ?/sec
array_has_any/scalar_some_match/500 1.00 83.1±0.39ms
? ?/sec 1.00 83.2±0.40ms ? ?/sec
array_has_any/some_match/10 1.01 6.5±0.03ms
? ?/sec 1.00 6.4±0.04ms ? ?/sec
array_has_any/some_match/100 1.00 14.6±0.06ms
? ?/sec 1.01 14.8±0.05ms ? ?/sec
array_has_any/some_match/500 1.00 50.1±0.13ms
? ?/sec 1.06 52.9±0.22ms ? ?/sec
array_has_any_scalar/i64_no_match/1 1.00 359.7±1.46µs
? ?/sec 1.04 373.2±2.89µs ? ?/sec
array_has_any_scalar/i64_no_match/10 1.91 844.9±9.22µs
? ?/sec 1.00 441.6±9.23µs ? ?/sec
array_has_any_scalar/i64_no_match/100 1.59 1003.3±34.17µs
? ?/sec 1.00 629.3±21.51µs ? ?/sec
array_has_any_scalar/i64_no_match/1000 1.77 955.1±12.20µs
? ?/sec 1.00 540.2±12.02µs ? ?/sec
array_has_any_scalar/string_no_match/1 1.01 256.7±1.83µs
? ?/sec 1.00 255.1±1.92µs ? ?/sec
array_has_any_scalar/string_no_match/10 1.97 826.3±13.46µs
? ?/sec 1.00 420.2±8.06µs ? ?/sec
array_has_any_scalar/string_no_match/100 1.65 910.6±19.59µs
? ?/sec 1.00 552.9±17.14µs ? ?/sec
array_has_any_scalar/string_no_match/1000 1.90 874.5±12.71µs
? ?/sec 1.00 459.8±8.70µs ? ?/sec
array_has_any_strings/no_match/10 1.00 5.0±0.01ms
? ?/sec 1.00 5.0±0.02ms ? ?/sec
array_has_any_strings/no_match/100 1.01 22.2±0.05ms
? ?/sec 1.00 22.0±0.03ms ? ?/sec
array_has_any_strings/no_match/500 1.00 128.7±0.18ms
? ?/sec 1.03 132.1±1.15ms ? ?/sec
array_has_any_strings/scalar_no_match/10 1.00 863.4±2.22µs
? ?/sec 1.07 920.9±1.92µs ? ?/sec
array_has_any_strings/scalar_no_match/100 1.00 7.3±0.02ms
? ?/sec 1.10 8.0±0.02ms ? ?/sec
array_has_any_strings/scalar_no_match/500 1.00 87.1±0.14ms
? ?/sec 1.05 91.4±0.14ms ? ?/sec
array_has_any_strings/scalar_some_match/10 1.00 769.2±2.00µs
? ?/sec 1.03 790.9±3.03µs ? ?/sec
array_has_any_strings/scalar_some_match/100 1.00 4.1±0.17ms
? ?/sec 1.04 4.3±0.22ms ? ?/sec
array_has_any_strings/scalar_some_match/500 1.00 16.9±0.08ms
? ?/sec 1.08 18.2±0.07ms ? ?/sec
array_has_any_strings/some_match/10 1.00 4.3±0.02ms
? ?/sec 1.00 4.3±0.01ms ? ?/sec
array_has_any_strings/some_match/100 1.01 14.3±0.05ms
? ?/sec 1.00 14.1±0.04ms ? ?/sec
array_has_any_strings/some_match/500 1.00 53.5±0.11ms
? ?/sec 1.00 53.6±0.07ms ? ?/sec
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]