tomstepp commented on code in PR #36112:
URL: https://github.com/apache/beam/pull/36112#discussion_r2353700818
##########
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Redistribute.java:
##########
@@ -157,8 +168,16 @@ public boolean getAllowDuplicates() {
@Override
public PCollection<T> expand(PCollection<T> input) {
- return input
- .apply("Pair with random key", ParDo.of(new
AssignShardFn<>(numBuckets)))
+ PCollection<KV<Integer, T>> sharded;
+ if (deterministicSharding) {
+ sharded =
+ input.apply(
+ "Pair with deterministic key",
+ ParDo.of(new AssignDeterministicShardFn<T>(numBuckets)));
Review Comment:
Do you have any tips for how to do this? I reviewed the existing
RedistributeTest, but not sure how to set the offsets for elements or
ProcessContext. If we can set it then we can verify repeated elements of the
same offset get the same key.
Or are you thinking of just verifying that setting withDeterministicSharding
using the new shard fn? Not sure how to verify yet, but can look into that one
some more.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]