github-actions[bot] commented on PR #60910:
URL: https://github.com/apache/doris/pull/60910#issuecomment-4096628426

   ## Code Review Summary: [Feature](func) Support table function json_each, 
json_each_text
   
   **Overall Assessment: No blocking issues found. This is a clean, 
well-structured feature addition.**
   
   This PR adds two new table generating functions (`json_each`, 
`json_each_text`) and their `_outer` variants, following the established 
`explode_json_object` pattern closely while incorporating improvements (const 
column optimization, explicit nullable handling, template-based code sharing).
   
   ---
   
   ### Critical Checkpoint Conclusions
   
   **1. Goal and correctness:**
   The goal is to add PostgreSQL-compatible `json_each` / `json_each_text` 
table functions that expand top-level JSON object keys into rows. The 
implementation correctly accomplishes this. Both BE unit tests and regression 
tests (30 test cases) prove correctness across normal, edge, and negative cases.
   
   **2. Modification scope:**
   The change is focused and minimal: 4 new FE classes (JsonEach/JsonEachText + 
Outer variants), 2 new BE files (vjson_each.h/.cpp), registration in 3 existing 
files, and comprehensive tests. No unrelated modifications.
   
   **3. Concurrency:**
   Not applicable — table functions operate within a single pipeline task 
context with no shared mutable state across threads.
   
   **4. Lifecycle management:**
   Verified correct. `_kv_pairs` and `_json_column` are properly nulled in 
`process_close()`, and `_cur_size` is reset to 0. The `process_init` → 
`process_row(0..N)` → `process_close` lifecycle is correctly maintained per 
block. The const column optimization is correct: base class 
`TableFunction::process_row()` conditionally preserves `_cur_size` when 
`_is_const`, allowing safe reuse of parsed `_kv_pairs` across rows within a 
block.
   
   **5. Configuration items:**
   None added — not applicable.
   
   **6. Incompatible changes:**
   None — this is a purely additive feature with no storage format or protocol 
changes.
   
   **7. Parallel code paths:**
   The implementation correctly mirrors `explode_json_object` patterns. All 
four FE/BE registration points are complete: `BuiltinTableGeneratingFunctions`, 
`TableGeneratingFunctionVisitor`, `table_function_factory.cpp`, and 
`function_fake.cpp`. The `_outer` suffix is handled by the existing generic 
framework (suffix stripping + `set_outer()`).
   
   **8. Special conditional checks:**
   The `!jv->isObject()` early-return (producing 0 rows for non-objects) 
correctly implements json_each semantics. The `_is_const && _cur_size > 0` 
optimization has clear intent from context.
   
   **9. Test coverage:**
   - **Regression tests**: 30 test cases covering json_each, json_each_text, 
outer variants, multiple LATERAL VIEWs, and corner cases (deep nesting, special 
chars, large objects, empty keys, const inputs). Tests follow all Doris 
standards: `ORDER BY` for determinism, `DROP TABLE IF EXISTS` before create, 
hardcoded table names, auto-generated `.out` file.
   - **BE unit tests**: 6 Google Test cases covering both variants, 
get_same_many_values, outer semantics, NULL/empty/non-object inputs.
   - No FE unit tests, but the FE classes are pure boilerplate (signature + 
visitor dispatch) with no logic to test.
   
   **10. Observability:**
   Not needed — table functions do not require dedicated metrics or logging 
beyond what the pipeline framework provides.
   
   **11. Transaction/persistence:**
   Not applicable — no metadata or EditLog changes.
   
   **12. Data writes:**
   Not applicable — read-only table function.
   
   **13. FE-BE variable passing:**
   Function names are passed via Thrift string 
(`TFunction.name.function_name`). No new Thrift types needed. Factory 
registration verified correct for both base and outer variants.
   
   **14. Performance:**
   - Template design (`TEXT_MODE`) avoids code duplication with zero runtime 
cost.
   - Const column optimization avoids redundant JSON re-parsing (improvement 
over `explode_json_object` which lacks this).
   - `JsonbWriter` is reused across iterations (not recreated per key-value 
pair) in the JSON mode loop.
   - The `(int)(_cur_size - _cur_offset)` C-style narrowing cast in `get_value` 
is consistent with all 9 other table functions in the codebase — not a new 
concern.
   - `compile_check_begin.h` / `compile_check_end.h` are properly paired in 
both new files.
   
   **15. Other observations:**
   - The DCHECK→DORIS_CHECK migration in `function_fake.cpp` is a good 
improvement (assertions survive release builds).
   - The `_is_const` field reuse from the base class is correct — the base 
class conditionally resets `_cur_size` based on `_is_const`, and `reset()` 
always resets `_cur_offset` and `_eos`.
   - `ColumnNullable::insert_data()` correctly maintains null map sync for both 
null and non-null insertions — verified against the implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to