akravchukdremio commented on code in PR #50187:
URL: https://github.com/apache/arrow/pull/50187#discussion_r3444194634


##########
cpp/src/gandiva/precompiled/string_ops_test.cc:
##########
@@ -1971,6 +1971,62 @@ TEST(TestStringOps, TestReplace) {
   EXPECT_EQ(std::string(out_str, out_len), "TestString");
   EXPECT_FALSE(ctx.has_error());
 
+  // Large output (>64 KB) must not overflow: buffer is sized to the exact 
result.
+  std::string large_in(35000, 'X');
+  std::string large_expected(70000, '\0');
+  for (int i = 0; i < 35000; ++i) {
+    large_expected[2 * i] = 'X';
+    large_expected[2 * i + 1] = 'Y';
+  }
+  out_str = replace_utf8_utf8_utf8(ctx_ptr, large_in.data(),
+                                   static_cast<int32_t>(large_in.size()), "X", 
1, "XY", 2,
+                                   &out_len);
+  EXPECT_EQ(out_len, 70000);
+  EXPECT_EQ(std::string(out_str, out_len), large_expected);
+  EXPECT_FALSE(ctx.has_error());
+
+  // Large shrinking output ("XX" -> "X") on a >64 KB input.
+  std::string large_shrink_in(70000, 'X');
+  std::string large_shrink_expected(35000, 'X');
+  out_str = replace_utf8_utf8_utf8(ctx_ptr, large_shrink_in.data(),
+                                   
static_cast<int32_t>(large_shrink_in.size()), "XX", 2,
+                                   "X", 1, &out_len);
+  EXPECT_EQ(out_len, 35000);
+  EXPECT_EQ(std::string(out_str, out_len), large_shrink_expected);
+  EXPECT_FALSE(ctx.has_error());
+
+  // Edge case: result size of exactly 0 (every byte of text is removed). Takes
+  // the no-scan shrink path (to_str_len <= from_str_len).
+  out_str = replace_utf8_utf8_utf8(ctx_ptr, "aaaa", 4, "a", 1, "", 0, 
&out_len);
+  EXPECT_EQ(out_len, 0);
+  EXPECT_EQ(std::string(out_str, out_len), "");
+  EXPECT_FALSE(ctx.has_error());
+
+  // Edge case: result size one past the INT_MAX boundary. 65536 single-char
+  // matches each expanding to 32768 bytes gives max_length = 65536 * 32768 =
+  // 2^31 = INT_MAX + 1, so it is reported cleanly (guard fires before any 
alloc).

Review Comment:
   nit: we can add test case when we have just max_length = `INT_MAX`, so code 
should proceed without returning error



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to