Re: [PR] [CALCITE-5826] Add FIND_IN_SET function (enabled in Hive and Spark libraries) [calcite]

via GitHub Mon, 30 Oct 2023 08:32:48 -0700


herunkang2018 commented on code in PR #3317:
URL: https://github.com/apache/calcite/pull/3317#discussion_r1376420977



##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -1076,6 +1078,41 @@ public static int levenshtein(String string1, String 
string2) {
     return LEVENSHTEIN_DISTANCE.apply(string1, string2);
   }
 
+  /** SQL FIND_IN_SET(matchStr, textStr) function.
+   * Returns the index (1-based) of the given matchStr
+   * in the comma-delimited list textStr. Returns 0,
+   * if the matchStr is not found or if the matchStr
+   * contains a comma. */
+  public static @Nullable Integer findInSet(
+      @Nullable String matchStr,
+      @Nullable String textStr) {
+    if (matchStr == null || textStr == null) {
+      return null;
+    }
+    if (matchStr.contains(String.valueOf(COMMA_DELIMITER))) {
+      return 0;
+    }
+    final int textStrLen = textStr.length();
+    final int matchStrLen = matchStr.length();
+    int n = 1;
+    int lastCommaIndex = -1;
+    for (int i = 0; i < textStrLen; i++) {
+      if (textStr.charAt(i) == COMMA_DELIMITER) {

Review Comment:
   Thanks for the advice. Current implementation is consistent with Spark's. 
For better code readability, change to use `split` is good, but some 
performance may be reduced since for loop can early exit when matched and 
`split` cannot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-5826] Add FIND_IN_SET function (enabled in Hive and Spark libraries) [calcite]

Reply via email to