Hey folks, I am using Apache beam Framework in Java with Direction Runner for 
local testing purposes. When using GroupIntoBatches with batch size 1 it works 
perfectly fine i.e. the output of the transform is consistent and as expected. 
But when using with batch size > 1 the output Pcollection has less data than it 
should be.

Pipeline flow:
1. A Transform for reading from pubsub
2. Transform for making a KV out of the data
3. A Fixed Window transform of 1 second
4. Applying GroupIntoBatches transform
5. And last, Logging the resulting Iterables.

Weird thing is that it batch_size > 1 works great when running on 
DataflowRunner but not with DirectRunner. I think the issue might be with Timer 
Expiry since GroupIntoBatches uses BagState internally.

Any help will be much appreciated.

Reply via email to