kennknowles commented on issue #23379:
URL: https://github.com/apache/beam/issues/23379#issuecomment-1261455416

   Just commenting with my thought process around this, and from reading code. 
(most of this could/should be in public docs but I don't know that it is)
   
    - A window "is" a key with an end time for the purposes of aggregation or 
stateful processing.
    - Windows are equally meaningful in batch and streaming, and _all_ windows 
may be processed simultaneously in either mode. Concepts like "the latest 
window that is still open" do not exist.
    - To the extent possible, and only allowing for nondeterministic variation, 
batch backfill and experiments should yield equivalent results to streaming.
    - There is no such thing as "late" in the current model, outside the 
context of an aggregation or stateful processing. For an ordinary ParDo, 
observing the watermark actually creates a side channel of aggregation, since 
the watermark is an aggregation, breaking the stateless per-element invariant.
    - Late means that downstream already has the "complete" result so may need 
to do some fixup work.
    - When writing files with windowed writes, the aggregation that has an end 
time is aggregating elements to the same file(set).
   
   I am having trouble fitting this request into that framing.
   
   But I have one way to start: when you are talking about concepts like "the 
latest window that is still open" it means that the time series you are 
mentally considering is the time series of when elements are ingested by a 
pipeline, not the time series of when the originating events happened. In that 
case you need the ability to adjust timestamps to be the ingestion time, for 
example just using the worker clock. Except you want to do it with something 
slightly more principled.
   
   This won't work with batch processing of an archive, of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to