Akanksha-kedia commented on code in PR #18790:
URL: https://github.com/apache/pinot/pull/18790#discussion_r3440124806


##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -1120,4 +1120,87 @@ public static String regexpSubstr(String input, String 
regexp) {
     Matcher matcher = Pattern.compile(regexp).matcher(input);
     return matcher.find() ? matcher.group() : null;
   }
+
+  /**
+   * Returns a string with every occurrence of a character in {@code from} 
replaced by the corresponding character in
+   * {@code to}. Characters in {@code input} that do not appear in {@code 
from} are left unchanged. If {@code from} is
+   * longer than {@code to}, characters beyond the length of {@code to} are 
deleted from the result. Equivalent to the
+   * SQL standard {@code TRANSLATE(string, from, to)} function supported by 
PostgreSQL, Oracle, and Trino.
+   *
+   * <p>Examples:
+   * <pre>
+   *   translate("hello", "aeiou", "AEIOU") → "hEllO"
+   *   translate("abc", "abc", "xy")        → "xy"   (c has no target → 
deleted)
+   *   translate("abcdef", "ace", "XY")     → "XbYdf" (e has no target → 
deleted)
+   * </pre>
+   *
+   * @param input the source string
+   * @param from  characters to search for (must not be null)
+   * @param to    replacement characters (must not be null; may be shorter 
than {@code from})
+   * @return the translated string
+   */
+  @ScalarFunction
+  public static String translate(String input, String from, String to) {
+    if (input.isEmpty() || from.isEmpty()) {

Review Comment:
   TRANSLATE was already merged via PR #18779. This PR only adds OVERLAY — it 
depends on that branch in the history but the final diff vs master will only 
contain the OVERLAY additions.



##########
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java:
##########
@@ -568,4 +568,75 @@ public void testRegexpSubstr() {
     assertEquals(StringFunctions.regexpSubstr("", "a"), null);
     assertEquals(StringFunctions.regexpSubstr("Hello World", "[A-Z][a-z]+"), 
"Hello");
   }
+
+  // ==================== Tests for translate ====================
+
+  @Test
+  public void testTranslate() {
+    // Basic character replacement
+    assertEquals(StringFunctions.translate("hello", "aeiou", "AEIOU"), 
"hEllO");
+
+    // Replacement and deletion: 'c' has no target in 'to' → deleted
+    assertEquals(StringFunctions.translate("abc", "abc", "xy"), "xy");
+
+    // Mixed: replacement and deletion
+    assertEquals(StringFunctions.translate("abcdef", "ace", "XY"), "XbYdf");
+
+    // No characters in 'from' match → input unchanged
+    assertEquals(StringFunctions.translate("hello", "xyz", "123"), "hello");
+
+    // Empty 'from' → input unchanged
+    assertEquals(StringFunctions.translate("hello", "", "abc"), "hello");
+
+    // Empty input → empty result
+    assertEquals(StringFunctions.translate("", "abc", "xyz"), "");
+
+    // All characters deleted (to is empty)
+    assertEquals(StringFunctions.translate("abc", "abc", ""), "");
+
+    // SQL standard example: partial substitution — digits 0-3 map to z/e/r/o; 
digits 4-9 are deleted
+    assertEquals(StringFunctions.translate("12300", "0123456789", "zero"), 
"erozz");
+
+    // Duplicate characters in 'from': first occurrence wins
+    assertEquals(StringFunctions.translate("aaa", "aa", "XY"), "XXX");
+  }
+
+  // ==================== Tests for overlay ====================
+
+  @Test
+  public void testOverlay() {
+    // Basic replacement: length defaults to length of replacement
+    assertEquals(StringFunctions.overlay("hello world", "there", 7), "hello 
there");
+
+    // Explicit length equal to length of replacement — same result
+    assertEquals(StringFunctions.overlay("hello world", "there", 7, 5), "hello 
there");
+
+    // Zero-length deletion: pure insertion
+    assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 0), "abXYcdef");
+
+    // Delete more than replacement length: replacement is shorter than 
deleted span
+    // FROM 3 FOR 4 removes positions 3-6 (cdef), nothing remains after 
position 6
+    assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 4), "abXY");
+
+    // Replace at start (position 1)
+    assertEquals(StringFunctions.overlay("abcdef", "Z", 1, 1), "Zbcdef");
+
+    // Replace at end
+    assertEquals(StringFunctions.overlay("abcdef", "Z", 6, 1), "abcdeZ");
+
+    // Replace entire string
+    assertEquals(StringFunctions.overlay("abcdef", "XY", 1, 6), "XY");
+
+    // Empty replacement (deletion only)
+    assertEquals(StringFunctions.overlay("abcdef", "", 3, 2), "abef");
+
+    // Empty input: result is the replacement
+    assertEquals(StringFunctions.overlay("", "abc", 1), "abc");
+
+    // start beyond end: appends replacement
+    assertEquals(StringFunctions.overlay("abc", "XY", 10), "abcXY");
+
+    // length clamped: cannot delete past end of string
+    assertEquals(StringFunctions.overlay("abc", "Z", 2, 100), "aZ");

Review Comment:
   Good catch. Added assertions for `start <= 0` (clamped to position 1) and 
`length < 0` (clamped to 0, pure insertion) in the latest commit.



##########
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java:
##########
@@ -568,4 +568,75 @@ public void testRegexpSubstr() {
     assertEquals(StringFunctions.regexpSubstr("", "a"), null);
     assertEquals(StringFunctions.regexpSubstr("Hello World", "[A-Z][a-z]+"), 
"Hello");
   }
+
+  // ==================== Tests for translate ====================
+
+  @Test
+  public void testTranslate() {
+    // Basic character replacement
+    assertEquals(StringFunctions.translate("hello", "aeiou", "AEIOU"), 
"hEllO");
+
+    // Replacement and deletion: 'c' has no target in 'to' → deleted
+    assertEquals(StringFunctions.translate("abc", "abc", "xy"), "xy");
+
+    // Mixed: replacement and deletion
+    assertEquals(StringFunctions.translate("abcdef", "ace", "XY"), "XbYdf");
+
+    // No characters in 'from' match → input unchanged
+    assertEquals(StringFunctions.translate("hello", "xyz", "123"), "hello");
+
+    // Empty 'from' → input unchanged
+    assertEquals(StringFunctions.translate("hello", "", "abc"), "hello");
+
+    // Empty input → empty result
+    assertEquals(StringFunctions.translate("", "abc", "xyz"), "");
+
+    // All characters deleted (to is empty)
+    assertEquals(StringFunctions.translate("abc", "abc", ""), "");
+
+    // SQL standard example: partial substitution — digits 0-3 map to z/e/r/o; 
digits 4-9 are deleted
+    assertEquals(StringFunctions.translate("12300", "0123456789", "zero"), 
"erozz");
+
+    // Duplicate characters in 'from': first occurrence wins
+    assertEquals(StringFunctions.translate("aaa", "aa", "XY"), "XXX");
+  }
+
+  // ==================== Tests for overlay ====================
+
+  @Test
+  public void testOverlay() {
+    // Basic replacement: length defaults to length of replacement
+    assertEquals(StringFunctions.overlay("hello world", "there", 7), "hello 
there");
+
+    // Explicit length equal to length of replacement — same result
+    assertEquals(StringFunctions.overlay("hello world", "there", 7, 5), "hello 
there");
+
+    // Zero-length deletion: pure insertion
+    assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 0), "abXYcdef");
+
+    // Delete more than replacement length: replacement is shorter than 
deleted span
+    // FROM 3 FOR 4 removes positions 3-6 (cdef), nothing remains after 
position 6
+    assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 4), "abXY");

Review Comment:
   Fixed in the latest commit — the test now correctly asserts `abXY` and the 
PR description example has been updated to match.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to