rkavanap commented on a change in pull request #11522:
URL: https://github.com/apache/arrow/pull/11522#discussion_r743381048



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1642,6 +1642,83 @@ const char* convert_toUTF8(int64_t context, const char* 
value, int32_t value_len
   return value;
 }
 
+// Calculate the levenshtein distance between two string values
+FORCE_INLINE
+gdv_int32 levenshtein_utf8_utf8(int64_t context, const char* in1, int32_t 
in1_len,
+                                const char* in2, int32_t in2_len) {
+  if (in1_len < 0 || in2_len < 0) {
+    gdv_fn_context_set_error_msg(context, "String length must be greater than 
0");
+    return 0;
+  }
+
+  // Check input size 0
+  if (in1_len == 0) {
+    return in2_len;
+  }
+  if (in2_len == 0) {
+    return in1_len;
+  }
+
+  int* ptr = new int[(in2_len + 1) * 2];

Review comment:
       There is one more optimization possible (looking at the java code). what 
if in1_len is far lesser than in2_len? Maybe doing an initial swap and using 
the bigger array in the outer loop and smaller array in the inner loop may make 
it more efficient. Or are you planning to make sure in2 is always than in1 when 
calling the method? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to