junaiddshaukat opened a new pull request, #39141:
URL: https://github.com/apache/beam/pull/39141
## Summary
The runner's first stateful, shuffle-bearing transform. Simplest GroupByKey
per
the plan agreed with @je-ik: GlobalWindow, default trigger, no allowed
lateness.
- Repartition by Kafka Streams: the Beam key is set as the Kafka record key
and
shuffled through an internal repartition topic (serialized with the
KStreamsPayload serde from #39051).
- Values buffered per key in a Kafka Streams state store; each key emitted
once
as `KV<K, Iterable<V>>` when the watermark (WatermarkManager) reaches
`TIMESTAMP_MAX_VALUE`.
- A broadcast partitioner fans the watermark out to all repartition
partitions;
the processor fires idempotently (a `fired` guard) since the terminal
watermark arrives on every partition.
## Testing
`GroupByKeyTest` drives `Impulse -> emit KVs -> GroupByKey -> record` end to
end
via TopologyTestDriver, round-tripping the repartition topic by hand (the
driver
does not loop a low-level sink back into its source). `:check` green, 36
tests.
## Out of scope (follow-ups)
- Non-global windows, triggers, allowed lateness — will reuse runner-core
`GroupAlsoByWindow` (needs state + timers).
- First `@ValidatesRunner` test — the gradle wiring is its own chunk; this
PR's
E2E test already proves grouping correctness.
Closes #39135
Refs #18479
cc @je-ik
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]