lalalastella opened a new pull request, #5792:
URL: https://github.com/apache/texera/pull/5792

   ### What changes were proposed in this PR?
   
   Add `DistributedAggregationSpec.scala` — a dedicated unit spec for 
`DistributedAggregation` in `common/workflow-operator`.
   
   `DistributedAggregation[P]` defines four functions (`init`, `iterate`, 
`merge`, `finalAgg`) for data-parallel aggregation (SOSP'09 paper). There is no 
dedicated spec that pins this contract; accidental refactors to the case class 
(field renames, reordering) would break every aggregation operator silently.
   
   The spec uses a representative **average** aggregation 
`DistributedAggregation[(Double, Long)]` (partial = `(sum, count)`) and asserts:
   
   | Step | Contract verified |
   |------|------------------|
   | `init()` | returns identity partial `(0.0, 0L)` |
   | `iterate` | folds one tuple's value into `(sum + v, count + 1)` |
   | `merge` | additive combination; commutative and associative; merge with 
`init()` is a no-op |
   | `finalAgg` | computes `sum / count` |
   | Distributed == single-node | split across two partitions → merge → 
finalAgg == single-node fold |
   | Empty partition | contributes `init()` and leaves the other partial 
unchanged |
   
   Tuple/Schema/Attribute helpers follow the `AggregateOpSpec` pattern in the 
same package.
   
   No production-code changes.
   
   ### Any related issues, documentation, discussions?
   
   Closes #5777
   
   ### How was this PR tested?
   
   ```
   sbt "WorkflowOperator/testOnly 
org.apache.texera.amber.operator.aggregate.DistributedAggregationSpec"
   ```
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Sonnet 4.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to