github-actions[bot] commented on PR #61352:
URL: https://github.com/apache/doris/pull/61352#issuecomment-4159542252

   ## Code Review Summary
   
   Reviewed the full PR (34 files, +2166/-591 lines) covering BE template 
refactor, 4 new FE aggregate function classes, factory registration, visitor 
changes, and all 9 regression test suites.
   
   ### Critical Checkpoint Conclusions
   
   1. **Goal & Correctness**: The PR adds `REGR_AVGX`, `REGR_AVGY`, 
`REGR_COUNT`, `REGR_R2` and refactors the 5 existing `REGR_*` functions into a 
unified template. The new template correctly implements the Youngs-Cramer 
online algorithm. `regr_r2` follows the SQL standard (NULL when n<1 or Sxx=0, 
1.0 when Syy=0, otherwise Sxy²/(Sxx·Syy)). `regr_count` correctly returns 0 for 
empty input (via `NotNullableAggregateFunction` + `resultForEmptyInput`). Tests 
prove correctness across edge cases.
   
   2. **Modification scope**: Focused and clean. The refactor replaces 
per-function files with a single parameterized template — a clear improvement 
in maintainability.
   
   3. **Concurrency**: Not applicable. Aggregate function states are 
single-threaded per aggregation context.
   
   4. **Lifecycle management**: No special lifecycle concerns. States are POD 
types (`std::tuple<UInt64>` for count, `std::tuple<UInt64, Float64, ...>` for 
others), no heap allocations.
   
   5. **Configuration items**: None added.
   
   6. **Incompatible changes / Serialization**: **Verified safe.** All 5 
existing functions have identical serialization layouts between old and new 
code. `regr_intercept` deliberately keeps `sy_level=2` (storing `syy` even 
though unused) to preserve backward compatibility, with an explicit comment 
explaining this design choice.
   
   7. **Parallel code paths**: All 9 regr functions now share a single code 
path via `RegrFunctionKind` enum + `RegrTraits` specializations. No parallel 
paths to worry about.
   
   8. **Special conditional checks**: The `if constexpr` guards on moment 
accessors are correct and well-commented. The floating-point exact-zero 
comparisons (`sxx() == 0.0`, `syy() == 0.0`) are safe because the Youngs-Cramer 
algorithm guarantees exact zero when all input values are identical.
   
   9. **Test coverage**: Comprehensive regression tests for all 4 new functions 
covering: empty tables, int/double inputs, nullable/non-nullable columns, 
literal inputs, constant-value edge cases, string/boolean/null inputs, and 
error cases. Existing function tests updated to match the relaxed type 
validation.
   
   10. **Observability**: Not applicable for aggregate functions — no new 
logging or metrics needed.
   
   11. **Transaction/persistence**: Not applicable.
   
   12. **Data writes**: Not applicable.
   
   13. **FE-BE variable passing**: All new functions correctly registered in 
both FE (`BuiltinAggregateFunctions.java`) and BE 
(`aggregate_function_regr.cpp`). Function names are lowercase and consistent. 
Return types agree (DoubleType for all except RegrCount which returns 
BigIntType). Nullable semantics match (`AlwaysNullable` for avgx/avgy/r2, 
`NotNullableAggregateFunction` for count).
   
   14. **Performance**: The unified template avoids code duplication without 
introducing overhead. `if constexpr` ensures dead code is eliminated at compile 
time. No redundant computations detected.
   
   15. **Other observations**:
       - Type validation was broadened: string and boolean types are now 
accepted (implicit cast to double) for all 9 regr functions. This is a 
behavioral change for the 5 existing functions but is consistent and reasonable.
       - Minor floating-point precision differences in `.out` files for 
existing functions (e.g., `5.512931034482759` → `5.512931034482757`) are 
expected from the refactored computation path and are within acceptable 
tolerance.
       - Pre-existing issue: some existing regr_slope/regr_intercept tests have 
duplicate `qt_` tags (e.g., `qt_sql_int_8` appears twice). Not introduced by 
this PR.
   
   ### Verdict
   
   **No blocking issues found.** The PR is well-designed, 
serialization-compatible, and thoroughly tested. The unified template approach 
is a clear improvement over the previous per-function implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to