[ 
https://issues.apache.org/jira/browse/BEAM-11087?focusedWorklogId=667123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-667123
 ]

ASF GitHub Bot logged work on BEAM-11087:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Oct/21 16:41
            Start Date: 19/Oct/21 16:41
    Worklog Time Spent: 10m 
      Work Description: lostluck commented on a change in pull request #15743:
URL: https://github.com/apache/beam/pull/15743#discussion_r732027485



##########
File path: sdks/go/pkg/beam/core/runtime/exec/window.go
##########
@@ -96,3 +96,23 @@ func (w *WindowInto) Down(ctx context.Context) error {
 func (w *WindowInto) String() string {
        return fmt.Sprintf("WindowInto[%v]. Out:%v", w.Fn, w.Out.ID())
 }
+
+// WindowMapper defines an interface maps windows from a main input window 
space
+// to windows from a side input window space. Used during side input 
materialization.
+type WindowMapper interface {
+       MapWindow(w typex.Window) (typex.Window, error)
+}
+
+type windowMapper struct {
+       wfn *window.Fn
+}
+
+func (f *windowMapper) MapWindow(w typex.Window) (typex.Window, error) {
+       candidates := assignWindows(f.wfn, w.MaxTimestamp())
+       if len(candidates) == 0 {
+               return nil, fmt.Errorf("failed to map main input window to side 
input window with WindowFn %v", f.wfn.String())
+       }
+       // Return latest candidate window in terms of event time (only relevant 
for sliding windows)
+       // Sliding windows append the latest window first in assignWindows.
+       return candidates[0], nil

Review comment:
       This is returning the 1st candidate. Is this correct? Shouldn't it be 
the last candidate `candidates[len(candidates-1)]` ?
   
   Python uses the last candidate....
   
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/sideinputs.py#L65
   
   And generates them like so, 
https://github.com/apache/beam/blob/aa4edda39ceb8d7a80f56bd37caa6233dba7de5d/sdks/python/apache_beam/transforms/window.py#L494
 
   
   which matches how we assign them in Go: 
https://github.com/apache/beam/blob/aa4edda39ceb8d7a80f56bd37caa6233dba7de5d/sdks/go/pkg/beam/core/runtime/exec/window.go#L72
   
   Java also does the same thing:
   
https://github.com/apache/beam/blob/aa4edda39ceb8d7a80f56bd37caa6233dba7de5d/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java#L111
   
   But statically constructs the window. 
https://github.com/apache/beam/blob/aa4edda39ceb8d7a80f56bd37caa6233dba7de5d/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/SlidingWindows.java#L133
   
   calling it the "earliest window" instead of the "latest" window.
   
   The bit that tipped me off to the inconsistency is that in the unit test you 
have, the side input window ends very much later than the fixed window, which 
doesn't make sense processing wise: Why wait for additional later data and 
delay main input processing until the watermark passes that later time?

##########
File path: sdks/go/pkg/beam/core/runtime/exec/window_test.go
##########
@@ -113,3 +113,47 @@ func TestAssignWindow(t *testing.T) {
                }
        }
 }
+
+func TestMapWindow(t *testing.T) {
+       tests := []struct {
+               name     string
+               wfn      *window.Fn
+               in       typex.Window
+               expected typex.Window
+       }{
+               {
+                       "interval to global",
+                       window.NewGlobalWindows(),
+                       window.IntervalWindow{Start: 0, End: 1000},
+                       window.GlobalWindow{},
+               },
+               {
+                       "global to global",
+                       window.NewGlobalWindows(),
+                       window.GlobalWindow{},
+                       window.GlobalWindow{},
+               },
+               {
+                       "interval to interval",
+                       window.NewFixedWindows(1000 * time.Millisecond),
+                       window.IntervalWindow{Start: 0, End: 100},
+                       window.IntervalWindow{Start: 0, End: 1000},
+               },
+               {
+                       "interval to sliding",
+                       window.NewSlidingWindows(500*time.Millisecond, 
1000*time.Millisecond),
+                       window.IntervalWindow{Start: 0, End: 600},
+                       window.IntervalWindow{Start: 500, End: 1500},

Review comment:
       The "earliest" window should be 0-1000 here I think.
   
   Since this one is trickier, I suggest we copy the testing values that Java's 
unit test uses (minus the offsets, which we don't support at present)
   
   
https://github.com/apache/beam/blob/4b7b74673b647c8d964b4877a8d66d47096acce4/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/SlidingWindowsTest.java#L175

##########
File path: sdks/go/test/integration/primitives/windowinto.go
##########
@@ -93,6 +94,41 @@ func WindowSums_Lifted(s beam.Scope) {
        WindowSums(s.Scope("Lifted"), stats.SumPerKey)
 }
 
+// ValidateWindowedSideInputs checks that side inputs have accurate windowing 
information when used.
+func ValidateWindowedSideInputs(s beam.Scope) {
+       timestampedData := beam.ParDo(s, &createTimestampedData{Data: []int{1, 
2, 3}}, beam.Impulse(s))
+
+       timestampedData = beam.DropKey(s, timestampedData)
+
+       windowSize := 1 * time.Second
+
+       validateSums := func(s beam.Scope, wfn, sideFn *window.Fn, in, side 
beam.PCollection, expected ...interface{}) {
+               wData := beam.WindowInto(s, wfn, in)
+               wSide := beam.WindowInto(s, sideFn, side)
+
+               sums := beam.ParDo(s, sumSideInputs, wData, 
beam.SideInput{Input: wSide})
+
+               sums = beam.WindowInto(s, window.NewGlobalWindows(), sums)
+
+               passert.Equals(s, sums, expected...)
+       }
+
+       validateSums(s.Scope("Fixed-Global"), 
window.NewFixedWindows(windowSize), window.NewGlobalWindows(), timestampedData, 
timestampedData, 7, 8, 9)
+       validateSums(s.Scope("Fixed-Same"), window.NewFixedWindows(windowSize), 
window.NewFixedWindows(windowSize), timestampedData, timestampedData, 2, 4, 6)
+       validateSums(s.Scope("Fixed-Big"), window.NewFixedWindows(windowSize), 
window.NewFixedWindows(10*time.Second), timestampedData, timestampedData, 7, 8, 
9)
+       validateSums(s.Scope("Fixed-Sliding"), 
window.NewFixedWindows(windowSize), window.NewSlidingWindows(windowSize, 
2*windowSize), timestampedData, timestampedData, 7, 4, 6)
+       validateSums(s.Scope("Sliding-Fixed"), 
window.NewSlidingWindows(windowSize, 2*windowSize), 
window.NewFixedWindows(windowSize), timestampedData, timestampedData, 2, 3, 4, 
5, 6, 3)

Review comment:
       Just so I understand what's going on for these sums, which we should 
probably add a clarifying comment for, as they are harder to figure out quickly 
vs the plain fixed ones.
   
   For Fixed-Sliding
   Main: With window size 1, each window contains 1 element (1, 2, 3)
   Side: window size 2, each window starts at 1. So we have [1], [1,2], [2,3], 
[3]
   So what gets computed here should be with earliest windows:
   (1, [1])  = 2
   (2, [1, 2]) = 5
   (3, [2, 3]) = 8
   
   What we have here does match what's implemented at least (latest windows).
   (1, [1, 2])  = 4
   (2, [2, 3]) = 7
   (3, [3]) = 6
   
   For sliding-Fixed:
   We have 
   ([1], [1]) = 2
   ([1, 2], [2]) = 3, 4
   ([2, 3], [3]) = 5, 6
   ([3], [] ) = 3
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 667123)
    Time Spent: 1h 10m  (was: 1h)

> [Go SDK] Validate Side Input behavior WRT windows 
> --------------------------------------------------
>
>                 Key: BEAM-11087
>                 URL: https://issues.apache.org/jira/browse/BEAM-11087
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Jack McCluskey
>            Priority: P3
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> DoFns with Side inputs implicitly observe windows, as Side Inputs are scoped 
> to a current window, a powerful feature of beam.
> The ideal would be to you could try to unit test this in the exec package, 
> creating a fake side input adapter (or use the real one), to more directly 
> target the implementation. 
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/sideinput.go#L34]
>  and exercising the ParDo code for Side Input handling directly.
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/pardo.go#L38]
>  
> Then it should be possible to bolster that with appropriate  mock windows to 
> ensure that side inputs are configured correctly.
> Altnernatively this behavior could be tested and validated with an 
> integration test against real runners due to the complexity around Side 
> Inputs making unit testing a challenge. (While side input code could be 
> tested in that fashion, it's likely dramatically simpler to do the 
> integration test.)
> Some light tests with Side Inputs already exist, but they're purely in the 
> Global Window. Add tests for non-global windows to ensure that Side Inputs 
> are scoped correctly.
> Integration test directory: 
> [https://github.com/apache/beam/tree/master/sdks/go/test/integration/primitives]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to