pitrou commented on a change in pull request #10317:
URL: https://github.com/apache/arrow/pull/10317#discussion_r632636762
##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -266,6 +266,52 @@ void EnsureLookupTablesFilled() {}
#endif // ARROW_WITH_UTF8PROC
+template <typename Type>
+struct AsciiReverse : StringTransform<Type, AsciiReverse<Type>> {
+ using Base = StringTransform<Type, AsciiReverse<Type>>;
+ using offset_type = typename Base::offset_type;
+
+ bool Transform(const uint8_t* input, offset_type input_string_ncodeunits,
+ uint8_t* output, offset_type* output_written) {
+ uint8_t utf8_char_found = 0;
+ for (offset_type i = 0; i < input_string_ncodeunits; i++) {
+ // if a utf8 char is found, report to utf8_char_found
+ utf8_char_found |= input[i] & 0x80;
Review comment:
The ASCII version would produce invalid output for valid (but non-ASCII)
input, which is why it checks for ASCII validity. Invalid input is a different
situation which we don't need to check for.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]