cyb70289 commented on a change in pull request #11376: URL: https://github.com/apache/arrow/pull/11376#discussion_r725765311
########## File path: cpp/src/arrow/util/utf8.h ########## @@ -210,44 +210,37 @@ inline bool ValidateUTF8(const uint8_t* data, int64_t size) { return ARROW_PREDICT_TRUE(state == internal::kUTF8ValidateAccept); } -inline bool ValidateUTF8(const util::string_view& str) { +static inline bool ValidateUTF8(const util::string_view& str) { const uint8_t* data = reinterpret_cast<const uint8_t*>(str.data()); const size_t length = str.size(); return ValidateUTF8(data, length); } -inline bool ValidateAsciiSw(const uint8_t* data, int64_t len) { +static inline bool ValidateAsciiSw(const uint8_t* data, int64_t len) { Review comment: Original code unrolls loop manually expecting to make better use of cpu pipeline. But it prevents the compiler to do better optimization leveraging auto vectorization. Actually, this simple code performs same as simd code, if build with clang. But gcc has big regression. So I left the simd code untouched. - _clang_, simd no better than naive code https://quick-bench.com/q/R5S9gDfyCzxs4pyQO3rLszMLCBI - _gcc_, simd is faster https://quick-bench.com/q/Xes5_3-CGjJbYNikY0E0BWdfVTo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org