github-actions[bot] commented on PR #60910: URL: https://github.com/apache/doris/pull/60910#issuecomment-4096628426
## Code Review Summary: [Feature](func) Support table function json_each, json_each_text **Overall Assessment: No blocking issues found. This is a clean, well-structured feature addition.** This PR adds two new table generating functions (`json_each`, `json_each_text`) and their `_outer` variants, following the established `explode_json_object` pattern closely while incorporating improvements (const column optimization, explicit nullable handling, template-based code sharing). --- ### Critical Checkpoint Conclusions **1. Goal and correctness:** The goal is to add PostgreSQL-compatible `json_each` / `json_each_text` table functions that expand top-level JSON object keys into rows. The implementation correctly accomplishes this. Both BE unit tests and regression tests (30 test cases) prove correctness across normal, edge, and negative cases. **2. Modification scope:** The change is focused and minimal: 4 new FE classes (JsonEach/JsonEachText + Outer variants), 2 new BE files (vjson_each.h/.cpp), registration in 3 existing files, and comprehensive tests. No unrelated modifications. **3. Concurrency:** Not applicable — table functions operate within a single pipeline task context with no shared mutable state across threads. **4. Lifecycle management:** Verified correct. `_kv_pairs` and `_json_column` are properly nulled in `process_close()`, and `_cur_size` is reset to 0. The `process_init` → `process_row(0..N)` → `process_close` lifecycle is correctly maintained per block. The const column optimization is correct: base class `TableFunction::process_row()` conditionally preserves `_cur_size` when `_is_const`, allowing safe reuse of parsed `_kv_pairs` across rows within a block. **5. Configuration items:** None added — not applicable. **6. Incompatible changes:** None — this is a purely additive feature with no storage format or protocol changes. **7. Parallel code paths:** The implementation correctly mirrors `explode_json_object` patterns. All four FE/BE registration points are complete: `BuiltinTableGeneratingFunctions`, `TableGeneratingFunctionVisitor`, `table_function_factory.cpp`, and `function_fake.cpp`. The `_outer` suffix is handled by the existing generic framework (suffix stripping + `set_outer()`). **8. Special conditional checks:** The `!jv->isObject()` early-return (producing 0 rows for non-objects) correctly implements json_each semantics. The `_is_const && _cur_size > 0` optimization has clear intent from context. **9. Test coverage:** - **Regression tests**: 30 test cases covering json_each, json_each_text, outer variants, multiple LATERAL VIEWs, and corner cases (deep nesting, special chars, large objects, empty keys, const inputs). Tests follow all Doris standards: `ORDER BY` for determinism, `DROP TABLE IF EXISTS` before create, hardcoded table names, auto-generated `.out` file. - **BE unit tests**: 6 Google Test cases covering both variants, get_same_many_values, outer semantics, NULL/empty/non-object inputs. - No FE unit tests, but the FE classes are pure boilerplate (signature + visitor dispatch) with no logic to test. **10. Observability:** Not needed — table functions do not require dedicated metrics or logging beyond what the pipeline framework provides. **11. Transaction/persistence:** Not applicable — no metadata or EditLog changes. **12. Data writes:** Not applicable — read-only table function. **13. FE-BE variable passing:** Function names are passed via Thrift string (`TFunction.name.function_name`). No new Thrift types needed. Factory registration verified correct for both base and outer variants. **14. Performance:** - Template design (`TEXT_MODE`) avoids code duplication with zero runtime cost. - Const column optimization avoids redundant JSON re-parsing (improvement over `explode_json_object` which lacks this). - `JsonbWriter` is reused across iterations (not recreated per key-value pair) in the JSON mode loop. - The `(int)(_cur_size - _cur_offset)` C-style narrowing cast in `get_value` is consistent with all 9 other table functions in the codebase — not a new concern. - `compile_check_begin.h` / `compile_check_end.h` are properly paired in both new files. **15. Other observations:** - The DCHECK→DORIS_CHECK migration in `function_fake.cpp` is a good improvement (assertions survive release builds). - The `_is_const` field reuse from the base class is correct — the base class conditionally resets `_cur_size` based on `_is_const`, and `reset()` always resets `_cur_offset` and `_eos`. - `ColumnNullable::insert_data()` correctly maintains null map sync for both null and non-null insertions — verified against the implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
