[
https://issues.apache.org/jira/browse/BEAM-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343438#comment-17343438
]
Udi Meiri commented on BEAM-9487:
---------------------------------
Regarding deterministic coder checking, this is done in a couple of places (or
more?):
https://github.com/apache/beam/blob/51b37d885da67b9f0fb91e61b7be2b9598c6c947/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L838
(and in direct runner)
and via the PValue.requires_deterministic_key_coder attribute.
Note that the pipeline might still run with a non-deterministic coder, but the
coder should fail at runtime with an error ("Unable to deterministically encode
...").
> GBKs on unbounded pcolls with global windows and no triggers should fail
> ------------------------------------------------------------------------
>
> Key: BEAM-9487
> URL: https://issues.apache.org/jira/browse/BEAM-9487
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Udi Meiri
> Assignee: Zachary Houfek
> Priority: P2
> Labels: EaseOfUse, starter
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> This, according to "4.2.2.1 GroupByKey and unbounded PCollections" in
> https://beam.apache.org/documentation/programming-guide/.
> bq. If you do apply GroupByKey or CoGroupByKey to a group of unbounded
> PCollections without setting either a non-global windowing strategy, a
> trigger strategy, or both for each collection, Beam generates an
> IllegalStateException error at pipeline construction time.
> Example where this doesn't happen in Python SDK:
> https://stackoverflow.com/questions/60623246/merge-pcollection-with-apache-beam
> I also believe that this unit test should fail, since test_stream is
> unbounded, uses global window, and has no triggers.
> {code}
> def test_global_window_gbk_fail(self):
> with TestPipeline() as p:
> test_stream = TestStream()
> _ = p | test_stream | GroupByKey()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)