aglinxinyuan opened a new issue, #5777:
URL: https://github.com/apache/texera/issues/5777

   ### Task Summary
   
   Add a dedicated `DistributedAggregationSpec.scala` that pins the 
four-function distributed-aggregation contract end-to-end using a 
representative aggregation (e.g. average): local partials computed per "node" 
and then merged must equal a single-node fold.
   
   ## Background
   
   `DistributedAggregation[P <: AnyRef]` 
(`operator/aggregate/DistributedAggregation.scala`) is the case class that 
defines how an aggregate is computed in a data-parallel engine, via four 
functions (pattern from the SOSP'09 *Distributed Aggregation* paper). It has no 
dedicated unit-spec.
   
   ```scala
   case class DistributedAggregation[P <: AnyRef](
       init: () => P,                 // initial partial
       iterate: (P, Tuple) => P,      // accumulate one input tuple
       merge: (P, P) => P,            // combine two partials
       finalAgg: (P) => Object        // partial -> final value
   )
   ```
   
   ## Behavior to pin
   
   Define a representative average aggregation `DistributedAggregation[(Double, 
Long)]` over a single numeric column and assert:
   
   | Step | Contract |
   | --- | --- |
   | `init()` | returns the identity partial `(0.0, 0L)` |
   | `iterate` | folds a tuple's value in: `(sum + v, count + 1)` |
   | `merge` | combines two partials additively; commutative/associative |
   | `finalAgg` | `(sum, count) => sum / count` |
   | distributed == single-node | split the input tuples across two partitions, 
`iterate` each locally from `init()`, `merge` the partials, `finalAgg` → equals 
the average from folding all tuples in one partition |
   | empty partition | a partition with no tuples contributes `init()` and 
leaves the merged result unchanged |
   
   Build `Tuple`s with the `Schema` / `Attribute` / `Tuple` helpers — see 
`AggregateOpSpec` in the same package for the pattern.
   
   ## Scope
   
   - New spec: `DistributedAggregationSpec.scala` under 
`common/workflow-operator/src/test/scala/org/apache/texera/amber/operator/aggregate/`.
   - The spec supplies its own representative aggregation functions — the goal 
is to pin the case class's contract/wiring, not any specific operator.
   - No production-code changes.
   
   ### Task Type
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [x] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [ ] Other
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to