Can you share some more details? What is the expected output and what
output are you seeing?

On Thu, Feb 27, 2020 at 9:39 AM Vasu Gupta <[email protected]> wrote:

> Hey folks, I am using Apache beam Framework in Java with Direction Runner
> for local testing purposes. When using GroupIntoBatches with batch size 1
> it works perfectly fine i.e. the output of the transform is consistent and
> as expected. But when using with batch size > 1 the output Pcollection has
> less data than it should be.
>
> Pipeline flow:
> 1. A Transform for reading from pubsub
> 2. Transform for making a KV out of the data
> 3. A Fixed Window transform of 1 second
> 4. Applying GroupIntoBatches transform
> 5. And last, Logging the resulting Iterables.
>
> Weird thing is that it batch_size > 1 works great when running on
> DataflowRunner but not with DirectRunner. I think the issue might be with
> Timer Expiry since GroupIntoBatches uses BagState internally.
>
> Any help will be much appreciated.
>

Reply via email to