Hello Evan, This is not intentional, thank you for pointing it out. The correct code should be something like:
`com.google.common.primitives.UnsignedInteger.fromIntBits(hashOfShard).mod(UnsignedInteger.valueOf(numBuckets)).intValue()` If you could make this or equivalent change, that would be great. > Reshuffle is marked as deprecated Note that although this is true, it is widely used. Reshuffle forces data persistence in Dataflow, although it is a no-op in the Beam model, there is no current replacement for the operation it performs, and so it is used by many I/O transforms. That being said, withNumBuckets is only used with Pub/Sub Lite, and is safe to make correct. -Daniel On Fri, May 7, 2021 at 1:56 PM Brian Hulette <[email protected]> wrote: > I suspect this was unintentional. It looks like @Daniel Collins > <[email protected]> added the numBuckets parameter in > https://github.com/apache/beam/pull/11919, maybe they can confirm. > > Brian > > On Mon, May 3, 2021 at 5:17 PM Evan Galpin <[email protected]> wrote: > >> Hi all, >> >> While testing for a feature I’m implementing, I noticed that >> Reshuffle.AssignToShard[1] produces (N*2)-1 buckets, where N is the value >> of the user-defined numBuckets parameter. This is because the value of the >> variable having the remainder operator applied, hashOfShard, can be >> negative. >> >> Is it intentional to produce (N*2)-1 buckets? If not I’ll submit a small >> patch. >> >> I only worry about the implications of changing it for use cases already >> employing AssignToShard. Until recently (28 days ago[2]) the class was >> private, plus Reshuffle is marked as deprecated, so I imagine the impact >> would be low? Thoughts? >> >> Thanks, >> Evan >> >> [1] >> >> https://github.com/apache/beam/blob/abbe14f721327d51cce02876324e7feba98581e2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L160 >> [2] >> >> https://github.com/apache/beam/commit/abbe14f721327d51cce02876324e7feba98581e2 >> >
