github-actions[bot] commented on code in PR #63784:
URL: https://github.com/apache/doris/pull/63784#discussion_r3315327145


##########
be/src/exprs/function/url/find_symbols.h:
##########
@@ -330,7 +330,68 @@ inline const char* find_first_symbols_sse42(const char* 
const begin, const char*
     return return_mode == ReturnMode::End ? end : nullptr;
 }
 
-/// NOTE No SSE 4.2 implementation for find_last_symbols_or_null. Not worth to 
do.
+template <bool positive, ReturnMode return_mode>
+inline const char* find_last_symbols_sse2(const char* const begin, const char* 
const end,
+                                          const char* symbols, size_t 
num_chars) {
+    if (begin >= end) return return_mode == ReturnMode::End ? end : nullptr;
+
+    const char* pos = end;
+
+#if defined(__SSE2__)
+    const auto needles = mm_is_in_prepare(symbols, num_chars);
+    for (; pos - 16 >= begin; pos -= 16) {
+        __m128i bytes = _mm_loadu_si128(reinterpret_cast<const __m128i*>(pos - 
16));
+
+        __m128i eq = mm_is_in_execute(bytes, needles, num_chars);
+
+        uint16_t bit_mask = 
maybe_negate<positive>(uint16_t(_mm_movemask_epi8(eq)));
+        if (bit_mask) return pos - 1 - (__builtin_clz(bit_mask) - 16);
+    }
+#endif
+
+    --pos;
+    for (; pos >= begin; --pos)
+        if (maybe_negate<positive>(is_in(*pos, symbols, num_chars))) return 
pos;

Review Comment:
   The scalar tail has undefined behavior when every remaining byte matches the 
trim set. For example, `find_last_not_symbols_or_null(s.data(), s.data() + 4, 
SearchSymbols(" "))` on four spaces reaches `pos == begin`, the loop body does 
not return, and the `--pos` in the loop expression decrements before the start 
of the array; the next `pos >= begin` comparison is also outside the valid 
range. This is the runtime SSE2 path used by the new ASCII rtrim optimization. 
Please use the same `for (const char* p = pos; p != begin;) { --p; ... }` 
pattern used in the SSE4.2 tail or otherwise avoid forming `begin - 1`.



##########
be/src/exprs/function/function_string_concat.h:
##########
@@ -608,57 +607,70 @@ class FunctionStringRepeat : public IFunction {
                                     argument_ptr[0]->get_name(), 
argument_ptr[1]->get_name());
     }
 
-    Status vector_vector(const ColumnString::Chars& data, const 
ColumnString::Offsets& offsets,
-                         const ColumnInt32::Container& repeats, 
ColumnString::Chars& res_data,
-                         ColumnString::Offsets& res_offsets,
+    Status vector_vector(const ColumnString& source_column, const 
ColumnInt32::Container& repeats,
+                         ColumnString::Chars& res_data, ColumnString::Offsets& 
res_offsets,
                          ColumnUInt8::Container& null_map) const {
-        size_t input_row_size = offsets.size();
-
-        fmt::memory_buffer buffer;
-        res_offsets.resize(input_row_size);
-        null_map.resize_fill(input_row_size, 0);
-        for (ssize_t i = 0; i < input_row_size; ++i) {
-            buffer.clear();
-            const char* raw_str = reinterpret_cast<const 
char*>(&data[offsets[i - 1]]);
-            size_t size = offsets[i] - offsets[i - 1];
-            int repeat = repeats[i];
-            if (repeat <= 0) {
-                StringOP::push_empty_string(i, res_data, res_offsets);
-            } else {
-                ColumnString::check_chars_length(repeat * size + 
res_data.size(), 0);
-                for (int j = 0; j < repeat; ++j) {
-                    buffer.append(raw_str, raw_str + size);
-                }
-                StringOP::push_value_string(std::string_view(buffer.data(), 
buffer.size()), i,
-                                            res_data, res_offsets);
-            }
-        }
-        return Status::OK();
+        return _repeat_execute(
+                source_column, [&repeats](size_t index) { return 
repeats[index]; }, res_data,
+                res_offsets, null_map);
     }
 
     // TODO: 1. use pmr::vector<char> replace fmt_buffer may speed up the code
     //       2. abstract the `vector_vector` and `vector_const`
     //       3. rethink we should use `DEFAULT_MAX_STRING_SIZE` to bigger here
-    void vector_const(const ColumnString::Chars& data, const 
ColumnString::Offsets& offsets,
-                      int repeat, ColumnString::Chars& res_data, 
ColumnString::Offsets& res_offsets,
-                      ColumnUInt8::Container& null_map) const {
-        size_t input_row_size = offsets.size();
+    Status vector_const(const ColumnString& source_column, int repeat,
+                        ColumnString::Chars& res_data, ColumnString::Offsets& 
res_offsets,
+                        ColumnUInt8::Container& null_map) const {
+        return _repeat_execute(
+                source_column, [repeat](size_t) { return repeat; }, res_data, 
res_offsets,
+                null_map);
+    }
+
+private:
+    template <typename RepeatGetter>
+    Status _repeat_execute(const ColumnString& source_column, RepeatGetter&& 
repeat_getter,
+                           ColumnString::Chars& res_data, 
ColumnString::Offsets& res_offsets,
+                           ColumnUInt8::Container& null_map) const {
+        const auto input_row_size = source_column.size();
 
-        fmt::memory_buffer buffer;
         res_offsets.resize(input_row_size);
         null_map.resize_fill(input_row_size, 0);
-        for (ssize_t i = 0; i < input_row_size; ++i) {
-            buffer.clear();
-            const char* raw_str = reinterpret_cast<const 
char*>(&data[offsets[i - 1]]);
-            size_t size = offsets[i] - offsets[i - 1];
-            ColumnString::check_chars_length(repeat * size + res_data.size(), 
0);
 
-            for (int j = 0; j < repeat; ++j) {
-                buffer.append(raw_str, raw_str + size);
+        uint32_t total_size = 0;
+        for (size_t i = 0; i < input_row_size; ++i) {
+            const int repeat = repeat_getter(i);
+            if (repeat <= 0) {
+                res_offsets[i] = total_size;
+                continue;
             }
-            StringOP::push_value_string(std::string_view(buffer.data(), 
buffer.size()), i, res_data,
-                                        res_offsets);
+
+            const auto str_ref = source_column.get_data_at(i);
+            const uint32_t size = str_ref.size;
+            total_size += size * repeat;
+            res_offsets[i] = total_size;

Review Comment:
   This pre-size pass can overflow before `ColumnString::check_chars_length()` 
sees the real output size. Both `total_size` and `size` are `uint32_t`, and 
`size * repeat` is evaluated in 32-bit unsigned arithmetic, so a large but 
valid input such as one row with `size=65536` and `repeat=65536` wraps to zero. 
The later `res_data.resize(total_size)` then allocates too little and the 
second pass writes the full repeated payload through `fast_repeat`, causing 
memory corruption instead of raising `STRING_OVERFLOW_IN_VEC_ENGINE`. Please 
compute in `size_t`/checked arithmetic and call `check_chars_length()` before 
narrowing into `ColumnString::Offset`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to