Can you share some more details? What is the expected output and what output are you seeing?
On Thu, Feb 27, 2020 at 9:39 AM Vasu Gupta <[email protected]> wrote: > Hey folks, I am using Apache beam Framework in Java with Direction Runner > for local testing purposes. When using GroupIntoBatches with batch size 1 > it works perfectly fine i.e. the output of the transform is consistent and > as expected. But when using with batch size > 1 the output Pcollection has > less data than it should be. > > Pipeline flow: > 1. A Transform for reading from pubsub > 2. Transform for making a KV out of the data > 3. A Fixed Window transform of 1 second > 4. Applying GroupIntoBatches transform > 5. And last, Logging the resulting Iterables. > > Weird thing is that it batch_size > 1 works great when running on > DataflowRunner but not with DirectRunner. I think the issue might be with > Timer Expiry since GroupIntoBatches uses BagState internally. > > Any help will be much appreciated. >
