On Wed, Feb 21, 2024 at 12:48 AM Jan Lukavský <je...@seznam.cz> wrote: > > Hi, > > I have left a note regarding the proposed splitting of batch and > streaming expansion of this transform. In general, a need for such split > triggers doubts in me. This signals that either > > a) the transform does something is should not, or > > b) Beam model is not complete in terms of being "unified" > > The problem that is described in the document is that in the batch case > timers are not fired appropriately.
+1. The underlying flaw is that processing time timers are not handled correctly in batch, but should be (even if it means keeping workers idle?). We should fix this. > This is actually on of the > motivations that led to introduction of @RequiresTimeSortedInput > annotation and, though mentioned years ago as a question, I do not > remember what arguments were used against enforcing sorting inputs by > timestamp in the batch stateful DoFn as a requirement in the model. That > would enable the appropriate firing of timers while preserving the batch > invariant which is there are no late data allowed. IIRC there are > runners that do this sorting by default (at least the sorting, not sure > about the timers, but once inputs are sorted, firing timers is simple). > > A different question is if this particular transform should maybe fire > not by event time, but rather processing time? Yeah, I was reading all of these as processing time. Throttling by event time doesn't make much sense. > On 2/21/24 03:00, Robert Burke wrote: > > Thanks for the design Damon! And thanks for collaborating with me on > > getting a high level textual description of the key implementation idea > > down in writing. I think the solution is pretty elegant. > > > > I do have concerns about how different Runners might handle > > ProcessContinuations for the Bounded Input case. I know Dataflow famously > > has two different execution modes under the hood, but I agree with the > > principle that ProcessContinuation.Resume should largely be in line with > > the expected delay, though it's by no means guaranteed AFAIK. > > > > We should also ensure this is linked from > > https://s.apache.org/beam-design-docs if not already. > > > > Robert Burke > > Beam Go Busybody > > > > On 2024/02/20 14:00:00 Damon Douglas wrote: > >> Hello Everyone, > >> > >> The following describes a Throttle PTransform that holds element throughput > >> to minimize downstream API overusage. Thank you for reading and your > >> valuable input. > >> > >> https://s.apache.org/beam-throttle-transform > >> > >> Best, > >> > >> Damon > >>