yordan-pavlov edited a comment on pull request #7037: URL: https://github.com/apache/arrow/pull/7037#issuecomment-624864888
I agree with @paddyhoran - if the goal is just to enable the use of arrow with stable Rust, it would be reasonable to just not enable the SIMD feature by default, but still keep it so it is available as a choice for those users who need the best performance possible. A lot of work has gone into the SIMD feature already and it would be a shame to remove it prematurely, without doing enough benchmarking. Furthermore, I think Rust could have a great future in the big data space and I think this project could play an important part. But SIMD is important in big data. So we should be looking to have SIMD stabilized (in Rust) rather than remove it. If SIMD is removed from arrow, what killer feature would motivate its stabilization in Rust? For convenience here are the results from my filtering benchmarks: | Benchmark | Time | | ------------------------------- | --------- | | filter with loop | 567.78 us | | filter with iter | 671.40 us | | filter with arrow loop | 1.2900 ms | | filter with arrow NO SIMD | 8.5939 ms | | filter with arrow SIMD (array) | 599.05 us | | filter with arrow SIMD (scalar) | 381.38 us | In the table above we can see that SIMD filtering (against scalar values) is 49% faster than a loop, and 76% faster than an iterator implementation. This could mean a difference between waiting 12h or 7h for a job to complete. So I think more benchmarking, profiling and performance improvements have to be done before it can be decided with confidence to remove SIMD (or not). The source code for the benchmarks used to produce the results about is here: https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs I am happy to contribute benchmarks, I just have to figure out how / if they would fit in the main arrow repo. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
