Reshuffle should be emitting exactly the same number of elements that it
gets. The GBK inside Reshuffle may have slightly less due to key
collisions, but the ExpandIterable step should take care of this. Do we
have counts for that output? (I will say that seem to be an
extraordinarily high number of collisions.)

On Fri, May 29, 2020 at 3:34 PM Daniel Oliveira <danolive...@google.com>
wrote:

> Hi dev list,
>
> While answering Stack Overflow questions I stumbled onto this:
> https://stackoverflow.com/questions/62017572/beam-java-dataflow-bigquery-streaming-insert-groupbykey-reducing-elements
>
> The user's pipeline seems to have a Reshuffle outputting less elements
> than it received, inside a BigQuery streaming insert. This looks like a bug
> to me since I assume Reshuffle should always be outputting unchanged
> elements, and I read through the code and as far as I can tell this
> shouldn't be happening. But I'm not too familiar with the code in question
> so I was hoping someone else with more context on it could help confirm.
>
> Thanks,
> Daniel Oliveira
>

Reply via email to