[
https://issues.apache.org/jira/browse/BEAM-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312564#comment-17312564
]
Kenneth Knowles commented on BEAM-12075:
----------------------------------------
To isolate the error, it will help to try this on other runners. I believe that
with {{--runner=FlinkRunner}} you will default to a local Flink so it is as
easy to use as the DirectRunner. It will be helpful to know if behavior is
different across runners. You could also try on Dataflow, since it seems you
have experience launching Dataflow jobs.
> GroupByKey doesn't seem to work with FixedWindows
> -------------------------------------------------
>
> Key: BEAM-12075
> URL: https://issues.apache.org/jira/browse/BEAM-12075
> Project: Beam
> Issue Type: Bug
> Components: extensions-java-gcp
> Affects Versions: 2.28.0
> Environment: Java 8,
> Reporter: Tianzi Cai
> Priority: P2
> Labels: Grouping, PubSubIO, Windowing
>
> After applying `FixedWindows` on a streaming source, a `GroupByKey` operation
> won't emit keyed elements in a window.
> This example without `GroupByKey` prints all the windowed elements:
>
> {noformat}
> pipeline
> .apply("ReadFromPubsub",
> PubsubIO.readStrings().fromSubscription(subscriptionPath))
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(5L))))
> .apply(WithKeys.of("bobcat"))
> .apply(MapElements.into(TypeDescriptors.nulls()).via(
> (KV<String, String> pair) -> {
> LOG.info("Key: " + pair.getKey() + "\tValue: " + pair.getValue());
> return null;
> }
> ));{noformat}
>
> This example with `GroupByKey` doesn't emit anything:
>
> {noformat}
> pipeline
> .apply("ReadFromPubsub",
> PubsubIO.readStrings().fromSubscription(subscriptionPath))
> .apply(Window.into(FixedWindows.of(Duration.standardSeconds(5L))))
> .apply(WithKeys.of("bobcat"))
> .apply(GroupByKey.create())
> .apply(FlatMapElements.into(TypeDescriptors.nulls()).via(
> (KV<String, Iterable<String>> pair) -> {
> pair.getValue().forEach(message -> LOG.info("Message: " + message));
> return null;
> }
> ));{noformat}
>
> I'm using DirectRunner. The same logic works for Python using both the
> DirectRunner and DataflowRunner.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)