urlyy opened a new pull request, #1778: URL: https://github.com/apache/fury/pull/1778
## What does this PR do? For the conversion from UTF-16 to UTF-8, a SIMD method based on AVX/SSE/NEON instruction sets was added on the basis of #1730 , and benchmarks were written. referencing - https://github.com/simdutf/simdutf/blob/master/src/westmere/sse_convert_utf16_to_utf8.cpp - https://github.com/simdutf/simdutf/blob/master/src/haswell/avx2_convert_utf16_to_utf8.cpp - https://github.com/simdutf/simdutf/blob/5c1a86887010cd2b4d648049c4d73de81a026341/src/arm64/arm_convert_utf16_to_utf8.cpp - https://github.com/simdutf/simdutf/blob/master/src/tables/utf16_to_utf8_tables.h Noticeļ¼ - I use two precomputing table , as same as what have done in `simdutf`. But it takes 1600 lines. - I copied two utf8 encoded text file into rust project for benchmark. - `util.rs` might need to be merged with `string_util.rs` ## Related issues - #1547 - #1730 ## Does this PR introduce any user-facing change? - [x] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark dataset from https://github.com/lemire/unicode_lipsum/tree/main/wikipedia_mars Both SIMD and non-SIMD approach are faster than using `String::from_utf16(bytes)`.In my win11 x86 machine benchmark , SIMD approach seems to be approximately only a little faster than normal approach , that is out of my expectation. AVX seems better than SSE because AVX handle 256bit at one time but SSE onlyt handle 128 bits at one time. When handling with surrogate pair, algorithm will use fall_back (normal, without SIMD) way, in this case simd approach might be worse than normal way.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
