lalalastella opened a new pull request, #5792: URL: https://github.com/apache/texera/pull/5792
### What changes were proposed in this PR? Add `DistributedAggregationSpec.scala` — a dedicated unit spec for `DistributedAggregation` in `common/workflow-operator`. `DistributedAggregation[P]` defines four functions (`init`, `iterate`, `merge`, `finalAgg`) for data-parallel aggregation (SOSP'09 paper). There is no dedicated spec that pins this contract; accidental refactors to the case class (field renames, reordering) would break every aggregation operator silently. The spec uses a representative **average** aggregation `DistributedAggregation[(Double, Long)]` (partial = `(sum, count)`) and asserts: | Step | Contract verified | |------|------------------| | `init()` | returns identity partial `(0.0, 0L)` | | `iterate` | folds one tuple's value into `(sum + v, count + 1)` | | `merge` | additive combination; commutative and associative; merge with `init()` is a no-op | | `finalAgg` | computes `sum / count` | | Distributed == single-node | split across two partitions → merge → finalAgg == single-node fold | | Empty partition | contributes `init()` and leaves the other partial unchanged | Tuple/Schema/Attribute helpers follow the `AggregateOpSpec` pattern in the same package. No production-code changes. ### Any related issues, documentation, discussions? Closes #5777 ### How was this PR tested? ``` sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.aggregate.DistributedAggregationSpec" ``` ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Sonnet 4.6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
