projjal commented on a change in pull request #7641:
URL: https://github.com/apache/arrow/pull/7641#discussion_r452613034



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -320,6 +385,143 @@ const char* trim_utf8(gdv_int64 context, const char* 
data, gdv_int32 data_len,
   return data + start;
 }
 
+// Trims characters present in the trim text from the left end of the base text
+FORCE_INLINE
+const char* ltrim_utf8_utf8(gdv_int64 context, const char* basetext,
+                            gdv_int32 basetext_len, const char* trimtext,
+                            gdv_int32 trimtext_len, int32_t* out_len) {

Review comment:
       The utf8 handling seems incorrect. You need to decode the utf8 char and 
match against the target string instead of matching individual bytes. In this 
case a byte of multibyte char in trim_text might match against a byte of 
another multibyte char in target string




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to