junaiddshaukat opened a new pull request, #39141:
URL: https://github.com/apache/beam/pull/39141

   ## Summary
   The runner's first stateful, shuffle-bearing transform. Simplest GroupByKey 
per
   the plan agreed with @je-ik: GlobalWindow, default trigger, no allowed 
lateness.
   
   - Repartition by Kafka Streams: the Beam key is set as the Kafka record key 
and
     shuffled through an internal repartition topic (serialized with the
     KStreamsPayload serde from #39051).
   - Values buffered per key in a Kafka Streams state store; each key emitted 
once
     as `KV<K, Iterable<V>>` when the watermark (WatermarkManager) reaches
     `TIMESTAMP_MAX_VALUE`.
   - A broadcast partitioner fans the watermark out to all repartition 
partitions;
     the processor fires idempotently (a `fired` guard) since the terminal
     watermark arrives on every partition.
   
   ## Testing
   `GroupByKeyTest` drives `Impulse -> emit KVs -> GroupByKey -> record` end to 
end
   via TopologyTestDriver, round-tripping the repartition topic by hand (the 
driver
   does not loop a low-level sink back into its source). `:check` green, 36 
tests.
   
   ## Out of scope (follow-ups)
   - Non-global windows, triggers, allowed lateness — will reuse runner-core
     `GroupAlsoByWindow` (needs state + timers).
   - First `@ValidatesRunner` test — the gradle wiring is its own chunk; this 
PR's
     E2E test already proves grouping correctness.
   
   Closes #39135
   Refs #18479
   cc @je-ik


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to