github-actions[bot] commented on PR #61156: URL: https://github.com/apache/doris/pull/61156#issuecomment-4035922700
## Code Review Summary: [Feature](func) Support REGEXP_EXTRACT_ALL_ARRAY **Overall Assessment: No significant issues found.** The PR is well-structured, clean, and follows existing conventions. ### Critical Checkpoint Conclusions 1. **Goal and Test Coverage**: The goal is to add a `regexp_extract_all_array` function that returns `Array<String>` instead of a string-formatted array. The code accomplishes this goal. Tests include BE unit tests (7 match cases, null input propagation, invalid pattern), regression tests (11 cases covering NULL, table columns, empty results, 0 capture groups, non-const patterns), which provide good coverage. 2. **Modification size/clarity**: The change is well-scoped. The refactoring from monolithic `execute_impl` to a Handler-based template strategy pattern is a clean approach that avoids code duplication between the string output and array output variants. The RegexpExtractImpl refactoring (moving execute logic from the wrapper class into the Impl struct) is a separate concern but keeps the two Impl structs consistent. 3. **Concurrency**: No concurrency concerns. The function execution is single-threaded per pipeline task. 4. **Lifecycle management**: No special lifecycle issues. Function state uses `shared_ptr<void>` for automatic cleanup. 5. **Configuration items**: None added. 6. **Incompatible changes / upgrade compatibility**: No incompatible changes. This is a purely additive new function. 7. **Parallel code paths**: The existing `regexp_extract_all` (string output) is refactored to use the same template infrastructure, maintaining behavioral compatibility. The string formatting logic (`['a','b']`) is preserved identically. 8. **Special conditional checks**: None. 9. **Test coverage**: Comprehensive - BE unit tests cover normal matches, empty results, NULL propagation, invalid patterns. Regression tests cover NULL, table columns with NULL values, non-const patterns (`concat`), 0 capture groups, and multiple match scenarios. 10. **Observability**: Not applicable for a simple scalar function. 11. **Transaction/persistence**: Not applicable. 12. **FE-BE variable passing**: The new function is registered in both FE (`BuiltinScalarFunctions`, `ScalarFunctionVisitor`, new `RegexpExtractAllArray.java`) and BE (`register_function_regexp_extract`). The FE class correctly follows the pattern of the existing `RegexpExtractAll` with `ArrayType` return type. 13. **Performance**: The Handler/template approach introduces zero runtime overhead (all dispatch is compile-time). The `std::visit` + `make_bool_variant` pattern for const/non-const dispatch is standard in the codebase. No redundant operations or anti-patterns. 14. **Other observations**: No issues found. The FE `AlwaysNullable` trait is appropriate. The array column construction (ColumnArray with nested ColumnNullable<ColumnString>) is correct. The `push_back`-based offset tracking in `RegexpExtractAllArrayOutput` correctly builds the array structure sequentially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
