github-actions[bot] commented on PR #61156:
URL: https://github.com/apache/doris/pull/61156#issuecomment-4035922700

   ## Code Review Summary: [Feature](func) Support REGEXP_EXTRACT_ALL_ARRAY
   
   **Overall Assessment: No significant issues found.** The PR is 
well-structured, clean, and follows existing conventions.
   
   ### Critical Checkpoint Conclusions
   
   1. **Goal and Test Coverage**: The goal is to add a 
`regexp_extract_all_array` function that returns `Array<String>` instead of a 
string-formatted array. The code accomplishes this goal. Tests include BE unit 
tests (7 match cases, null input propagation, invalid pattern), regression 
tests (11 cases covering NULL, table columns, empty results, 0 capture groups, 
non-const patterns), which provide good coverage.
   
   2. **Modification size/clarity**: The change is well-scoped. The refactoring 
from monolithic `execute_impl` to a Handler-based template strategy pattern is 
a clean approach that avoids code duplication between the string output and 
array output variants. The RegexpExtractImpl refactoring (moving execute logic 
from the wrapper class into the Impl struct) is a separate concern but keeps 
the two Impl structs consistent.
   
   3. **Concurrency**: No concurrency concerns. The function execution is 
single-threaded per pipeline task.
   
   4. **Lifecycle management**: No special lifecycle issues. Function state 
uses `shared_ptr<void>` for automatic cleanup.
   
   5. **Configuration items**: None added.
   
   6. **Incompatible changes / upgrade compatibility**: No incompatible 
changes. This is a purely additive new function.
   
   7. **Parallel code paths**: The existing `regexp_extract_all` (string 
output) is refactored to use the same template infrastructure, maintaining 
behavioral compatibility. The string formatting logic (`['a','b']`) is 
preserved identically.
   
   8. **Special conditional checks**: None.
   
   9. **Test coverage**: Comprehensive - BE unit tests cover normal matches, 
empty results, NULL propagation, invalid patterns. Regression tests cover NULL, 
table columns with NULL values, non-const patterns (`concat`), 0 capture 
groups, and multiple match scenarios.
   
   10. **Observability**: Not applicable for a simple scalar function.
   
   11. **Transaction/persistence**: Not applicable.
   
   12. **FE-BE variable passing**: The new function is registered in both FE 
(`BuiltinScalarFunctions`, `ScalarFunctionVisitor`, new 
`RegexpExtractAllArray.java`) and BE (`register_function_regexp_extract`). The 
FE class correctly follows the pattern of the existing `RegexpExtractAll` with 
`ArrayType` return type.
   
   13. **Performance**: The Handler/template approach introduces zero runtime 
overhead (all dispatch is compile-time). The `std::visit` + `make_bool_variant` 
pattern for const/non-const dispatch is standard in the codebase. No redundant 
operations or anti-patterns.
   
   14. **Other observations**: No issues found. The FE `AlwaysNullable` trait 
is appropriate. The array column construction (ColumnArray with nested 
ColumnNullable<ColumnString>) is correct. The `push_back`-based offset tracking 
in `RegexpExtractAllArrayOutput` correctly builds the array structure 
sequentially.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to