Dracylfrr opened a new pull request, #5625:
URL: https://github.com/apache/texera/pull/5625

   ### What changes were proposed in this PR?
   
   This PR adds a new **Column Summary Statistics** workflow operator.
   
   The operator takes one input table and outputs one summary row per input 
column. The output includes:
   
   * `columnName`
   * `dataType`
   * `rowCount`
   * `nullCount`
   * `nonNullCount`
   * `minValue`
   * `maxValue`
   * `meanValue`
   
   For numeric columns, the operator computes `minValue`, `maxValue`, and 
`meanValue` in addition to row/null/non-null counts.
   
   For non-numeric columns, the operator reports row/null/non-null counts and 
leaves numeric summary fields as `null`.
   
   This PR includes:
   
   * A new `ColumnSummaryStatisticsOpDesc`
   * A new `ColumnSummaryStatisticsOpExec`
   * A new `ColumnSummaryStatisticsOpExecConfig`
   * Operator registration in `LogicalOp`
   * Unit tests covering numeric, string, null, mixed-column, and empty-input 
behavior
   
   The operator is intentionally scoped as a workflow operator for basic 
per-column summary statistics.
   
   ### Any related issues, documentation, discussions?
   
   Related to #____
   
   ### How was this PR tested?
   
   Added unit tests in:
   
   
`common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/statistics/columnsummary/ColumnSummaryStatisticsOpExecSpec.scala`
   
   The tests cover:
   
   * Computing min, max, mean, row count, null count, and non-null count for an 
integer column
   * Computing numeric statistics while leaving non-numeric statistics as `null`
   * Returning one summary row for each input column
   * Returning no rows when no input tuples are processed
   
   Test command run locally:
   
   `sbt "WorkflowOperator / testOnly 
org.apache.texera.amber.operator.statistics.columnsummary.ColumnSummaryStatisticsOpExecSpec"`
   
   Result:
   
   `Tests: succeeded 4, failed 0`
   
   `All tests passed.`
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: ChatGPT (GPT-5.5 Thinking)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to