[ 
https://issues.apache.org/jira/browse/BEAM-9487?focusedWorklogId=662148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-662148
 ]

ASF GitHub Bot logged work on BEAM-9487:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Oct/21 23:21
            Start Date: 07/Oct/21 23:21
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on a change in pull request 
#15603:
URL: https://github.com/apache/beam/pull/15603#discussion_r723705851



##########
File path: sdks/python/apache_beam/transforms/ptransform_test.py
##########
@@ -505,12 +504,10 @@ def test_group_by_key_allow_unsafe_triggers(self):
           | beam.Create([(1, 1), (1, 2), (1, 3), (1, 4)])
           | WindowInto(
               window.GlobalWindows(),
-              trigger=trigger.AfterCount(5),
+              trigger=trigger.AfterCount(4),
               accumulation_mode=trigger.AccumulationMode.ACCUMULATING)
           | beam.GroupByKey())
-      # We need five, but it only has four - Displays how this option is
-      # dangerous.
-      assert_that(pcoll, is_empty())
+      assert_that(pcoll, equal_to([(1, [1, 2, 3, 4])]))

Review comment:
       You ought to be able to get output from `AfterCount(5)`, right?

##########
File path: sdks/python/apache_beam/transforms/ptransform_test.py
##########
@@ -505,12 +504,10 @@ def test_group_by_key_allow_unsafe_triggers(self):
           | beam.Create([(1, 1), (1, 2), (1, 3), (1, 4)])
           | WindowInto(
               window.GlobalWindows(),
-              trigger=trigger.AfterCount(5),
+              trigger=trigger.AfterCount(4),
               accumulation_mode=trigger.AccumulationMode.ACCUMULATING)
           | beam.GroupByKey())
-      # We need five, but it only has four - Displays how this option is
-      # dangerous.
-      assert_that(pcoll, is_empty())
+      assert_that(pcoll, equal_to([(1, [1, 2, 3, 4])]))

Review comment:
       Yea it sounds like it. We would benefit from:
   
   1. Test of an unsafe trigger that does fire, confirming it is allowed and 
works.
   2. Test of unsafe trigger dropping subsequent data (less important to test)
   3. Test of arbitrary trigger (doesn't matter which kind) that never fires 
but still on GC the elements come out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 662148)
    Time Spent: 34h 40m  (was: 34.5h)

> GBKs on unbounded pcolls with global windows and no triggers should fail
> ------------------------------------------------------------------------
>
>                 Key: BEAM-9487
>                 URL: https://issues.apache.org/jira/browse/BEAM-9487
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Udi Meiri
>            Assignee: Zachary Houfek
>            Priority: P1
>              Labels: EaseOfUse, starter
>             Fix For: 2.34.0
>
>          Time Spent: 34h 40m
>  Remaining Estimate: 0h
>
> This, according to "4.2.2.1 GroupByKey and unbounded PCollections" in 
> https://beam.apache.org/documentation/programming-guide/.
> bq. If you do apply GroupByKey or CoGroupByKey to a group of unbounded 
> PCollections without setting either a non-global windowing strategy, a 
> trigger strategy, or both for each collection, Beam generates an 
> IllegalStateException error at pipeline construction time.
> Example where this doesn't happen in Python SDK: 
> https://stackoverflow.com/questions/60623246/merge-pcollection-with-apache-beam
> I also believe that this unit test should fail, since test_stream is 
> unbounded, uses global window, and has no triggers.
> {code}
>   def test_global_window_gbk_fail(self):
>     with TestPipeline() as p:
>       test_stream = TestStream()
>       _ = p | test_stream | GroupByKey()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to