lostluck commented on issue #33815:
URL: https://github.com/apache/beam/issues/33815#issuecomment-2842620247

   @shunping's investigations show that Prism isn't checkpointing the Unbounded 
SDF periodically, which is probably what Dataflow's implementation does. 
Basically the PeriodicImpulse is just running to try and catch up to "now". 
   
   The split algorithm Prism uses is very simple and tries to avoid things 
processing too slow, which is defined as not triggering downstream side 
effects. 
   
   
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/stage.go#L223
   
   Essentially, it doesn't try to tell the SDK to stop.
   
   The Go SDK's periodic impulse isn't very clever: it tries to catch up to 
"now" and doesn't try to self checkpoint until it has. Likely because it was 
developed against Dataflow which will aggressively force checpointing.
   
   
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/transforms/periodic/periodic.go#L107
   
   Basically, the "Start Periodic Impulse at the beginning of time" is an 
antipattern. But it is allowed by the semantics of the model, so prism must 
need to deal with it.
   
   While the Go SDK's periodic impulse could be improved to self checkpoint 
earlier (and probably should), prism should also be improved to more gracefully 
handle this. The checkpointing is required to allow the transforms' output 
watermark to advance, enabling the downstream flattens to run.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to