pandalee99 opened a new pull request, #1720: URL: https://github.com/apache/fury/pull/1720
## What does this PR do? ref: https://arxiv.org/pdf/1902.08318.pdf ref: https://github.com/simdutf/simdutf I learned about the related simd technology, as well as this paper and project implementation. Using SIMD technique for string detection. First, I need to implement the logic and complete the latin character detection ``` c++ // Baseline implementation bool isLatin_Baseline(const std::string& str) { for (char c : str) { if (static_cast<unsigned char>(c) >= 128) { return false; } } return true; } ``` <img width="393" alt="image" src="https://raw.githubusercontent.com/pandalee99/image_store/master/hexo/simd_base_line_test1.png"> Then, I tried to use SSE2 to speed it up, which is obviously a little bit faster, the logic is to read multiple characters at once and then do the bit arithmetic Obviously, there was a speed boost, but I didn't think it was enough, so I tried it again with AVX2 <img width="493" alt="image" src="https://raw.githubusercontent.com/pandalee99/image_store/master/hexo/simd_test_all_1.png"> I think in terms of efficiency, it's already much faster than before. But how do you prove that it's also logically true? I added test samples to verify ``` C++ TEST(StringUtilTest, TestIsLatinLogic) ``` Finally, I ran the test <img width="493" alt="image" src="https://raw.githubusercontent.com/pandalee99/image_store/master/hexo/simd_ubantu_test_1.png"> done. <!-- Describe the purpose of this PR. --> ## Related issues Closes #313 <!-- Is there any related issue? Please attach here. - #xxxx0 - #xxxx1 - #xxxx2 --> ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fury/issues/new/choose) describing the need to do so and update the document if necessary. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
