Aditya Guru created BEAM-6743:
---------------------------------
Summary: Triggers not working for bounded data
Key: BEAM-6743
URL: https://issues.apache.org/jira/browse/BEAM-6743
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Environment: Apache Beam 2.9.0 Java
Google Cloud Dataflow Runner
Reporter: Aditya Guru
pCollection
.apply(Window.<String>into(FixedWindows.of(Duration.millis(100)))
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1000)))
.discardingFiredPanes().withAllowedLateness(Duration.ZERO))
.apply(TextIO.write().withWindowedWrites().withNumShards(1).to('gs-path'));
Here pCollection is a *bounded* PCollection. I'm trying to break it into files
of 1000 roughly, but all I get is 2 files one having 1000 other having the rest
of the data.
If instead I do:-
pCollection
.apply(new GlobalWindow())
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1000)))
.discardingFiredPanes().withAllowedLateness(Duration.ZERO))
.apply(TextIO.write().withWindowedWrites().withNumShards(1).to('gs-path'));
I get just one file.
Both of the above cases should have conceptually divided the records into
chucks of 1000 to be written in a file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)