I asked the user to check if it was just the GBK or the entire Reshuffle,
and they confirmed it was the entire Reshuffle. Also their pipeline did
ultimately not have everything that was expected to be output. I'm still
asking the user for more info to make sure this isn't a bug on the Dataflow
side.

On Fri, May 29, 2020 at 4:32 PM Robert Bradshaw <[email protected]> wrote:

> Reshuffle should be emitting exactly the same number of elements that it
> gets. The GBK inside Reshuffle may have slightly less due to key
> collisions, but the ExpandIterable step should take care of this. Do we
> have counts for that output? (I will say that seem to be an
> extraordinarily high number of collisions.)
>
> On Fri, May 29, 2020 at 3:34 PM Daniel Oliveira <[email protected]>
> wrote:
>
>> Hi dev list,
>>
>> While answering Stack Overflow questions I stumbled onto this:
>> https://stackoverflow.com/questions/62017572/beam-java-dataflow-bigquery-streaming-insert-groupbykey-reducing-elements
>>
>> The user's pipeline seems to have a Reshuffle outputting less elements
>> than it received, inside a BigQuery streaming insert. This looks like a bug
>> to me since I assume Reshuffle should always be outputting unchanged
>> elements, and I read through the code and as far as I can tell this
>> shouldn't be happening. But I'm not too familiar with the code in question
>> so I was hoping someone else with more context on it could help confirm.
>>
>> Thanks,
>> Daniel Oliveira
>>
>

Reply via email to