damccorm commented on code in PR #17724:
URL: https://github.com/apache/beam/pull/17724#discussion_r878339972


##########
sdks/go/pkg/beam/core/runtime/exec/sdf.go:
##########
@@ -647,6 +653,46 @@ func (n *ProcessSizedElementsAndRestrictions) Split(f 
float64) ([]*FullValue, []
        return p, r, nil
 }
 
+// Checkpoint splits the remaining work in a restriction into residuals to be 
resumed
+// later by the runner. This is done iff the underlying Splittable DoFn 
returns a resuming
+// ProcessContinuation. If the split occurs and the primary restriction is 
marked as done
+// my the RTracker, the Checkpoint fails as this is a potential data-loss case.
+func (n *ProcessSizedElementsAndRestrictions) Checkpoint() ([]*FullValue, 
error) {

Review Comment:
   Could you please add some test cases? It should be fairly straightforward to 
model them off of the existing Split tests



##########
sdks/go/pkg/beam/core/runtime/exec/sdf.go:
##########
@@ -647,6 +653,46 @@ func (n *ProcessSizedElementsAndRestrictions) Split(f 
float64) ([]*FullValue, []
        return p, r, nil
 }
 
+// Checkpoint splits the remaining work in a restriction into residuals to be 
resumed
+// later by the runner. This is done iff the underlying Splittable DoFn 
returns a resuming
+// ProcessContinuation. If the split occurs and the primary restriction is 
marked as done
+// my the RTracker, the Checkpoint fails as this is a potential data-loss case.
+func (n *ProcessSizedElementsAndRestrictions) Checkpoint() ([]*FullValue, 
error) {
+       // Get the watermark state immediately so that we don't overestimate 
our current watermark.
+       var pWeState interface{}
+       var rWeState interface{}
+       rWeState = n.wesInv.Invoke(n.PDo.we)
+       pWeState = rWeState
+       // If we've processed elements, the initial watermark estimator state 
will be set.
+       // In that case we should hold the output watermark at that initial 
state so that we don't
+       // Advance past where the current elements are holding the watermark
+       if n.initWeS != nil {
+               pWeState = n.initWeS
+       }
+       addContext := func(err error) error {
+               return errors.WithContext(err, "Attempting checkpoint in 
ProcessSizedElementsAndRestrictions")
+       }
+
+       // Errors checking.
+       if n.rt == nil {
+               return nil, addContext(errors.New("Restriction tracker 
missing."))
+       }
+       if err := n.rt.GetError(); err != nil {
+               return nil, addContext(err)
+       }
+
+       _, r, err := n.singleWindowSplit(0.0, pWeState, rWeState)
+       if err != nil {
+               return nil, addContext(err)
+       }

Review Comment:
   
   ```suggestion
        _, r, err := n.Split(0.0)
   ```
   
   I like moving this into its own function in sdf.go, but that doesn't mean we 
can't still leverage Split



##########
sdks/go/pkg/beam/core/runtime/exec/sdf.go:
##########
@@ -647,6 +653,46 @@ func (n *ProcessSizedElementsAndRestrictions) Split(f 
float64) ([]*FullValue, []
        return p, r, nil
 }
 
+// Checkpoint splits the remaining work in a restriction into residuals to be 
resumed
+// later by the runner. This is done iff the underlying Splittable DoFn 
returns a resuming
+// ProcessContinuation. If the split occurs and the primary restriction is 
marked as done
+// my the RTracker, the Checkpoint fails as this is a potential data-loss case.
+func (n *ProcessSizedElementsAndRestrictions) Checkpoint() ([]*FullValue, 
error) {
+       // Get the watermark state immediately so that we don't overestimate 
our current watermark.
+       var pWeState interface{}
+       var rWeState interface{}
+       rWeState = n.wesInv.Invoke(n.PDo.we)
+       pWeState = rWeState
+       // If we've processed elements, the initial watermark estimator state 
will be set.
+       // In that case we should hold the output watermark at that initial 
state so that we don't
+       // Advance past where the current elements are holding the watermark
+       if n.initWeS != nil {
+               pWeState = n.initWeS
+       }
+       addContext := func(err error) error {
+               return errors.WithContext(err, "Attempting checkpoint in 
ProcessSizedElementsAndRestrictions")
+       }
+
+       // Errors checking.
+       if n.rt == nil {
+               return nil, addContext(errors.New("Restriction tracker 
missing."))
+       }
+       if err := n.rt.GetError(); err != nil {
+               return nil, addContext(err)
+       }
+
+       _, r, err := n.singleWindowSplit(0.0, pWeState, rWeState)
+       if err != nil {
+               return nil, addContext(err)
+       }
+
+       if !n.rt.IsDone() {
+               return nil, addContext(errors.New("Primary restriction is not 
done, data may be lost as a result"))

Review Comment:
   We should still provide more context here - important pieces of information 
are:
   
   1) This happened during a self checkpoint
   2) That is probably a problem with their TrySplit logic - they should never 
have a restriction that isn't done after splitting at 0.0
   
   Also, because this is an error, it will kill pipeline execution (so data 
loss isn't really a concern)



##########
sdks/go/pkg/beam/core/runtime/exec/sdf.go:
##########
@@ -647,6 +653,46 @@ func (n *ProcessSizedElementsAndRestrictions) Split(f 
float64) ([]*FullValue, []
        return p, r, nil
 }
 
+// Checkpoint splits the remaining work in a restriction into residuals to be 
resumed
+// later by the runner. This is done iff the underlying Splittable DoFn 
returns a resuming
+// ProcessContinuation. If the split occurs and the primary restriction is 
marked as done
+// my the RTracker, the Checkpoint fails as this is a potential data-loss case.
+func (n *ProcessSizedElementsAndRestrictions) Checkpoint() ([]*FullValue, 
error) {
+       // Get the watermark state immediately so that we don't overestimate 
our current watermark.
+       var pWeState interface{}
+       var rWeState interface{}
+       rWeState = n.wesInv.Invoke(n.PDo.we)
+       pWeState = rWeState
+       // If we've processed elements, the initial watermark estimator state 
will be set.
+       // In that case we should hold the output watermark at that initial 
state so that we don't
+       // Advance past where the current elements are holding the watermark
+       if n.initWeS != nil {
+               pWeState = n.initWeS
+       }

Review Comment:
   If you do keep this and don't rely on split, you can get rid of all the 
pWeState logic and just set it to nil since we're discarding the primary 
watermark anyways



##########
sdks/go/pkg/beam/core/sdf/sdf.go:
##########
@@ -73,8 +73,8 @@ type RTracker interface {
        // reason), then this function returns nil as the residual.
        //
        // If the split fraction is 0 (e.g. a self-checkpointing split) 
TrySplit() should return either
-       // a nil primary or an RTracker that is both bounded and has size 0. 
This ensures that there is
-       // no data that is lost by not being rescheduled for execution later.
+       // a nil primary or a restriction that represents no remaining work. 
This will ensure that there
+       // is not data loss.

Review Comment:
   Really we don't care about the primary that they return, right? The thing we 
really care about is that they've correctly set their RTracker such that 
IsDone() returns true.
   
   Also, the current consequence isn't data loss, its a failed pipeline I think



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to