On Wed, Feb 21, 2024 at 12:48 AM Jan Lukavský <je...@seznam.cz> wrote:
>
> Hi,
>
> I have left a note regarding the proposed splitting of batch and
> streaming expansion of this transform. In general, a need for such split
> triggers doubts in me. This signals that either
>
>   a) the transform does something is should not, or
>
>   b) Beam model is not complete in terms of being "unified"
>
> The problem that is described in the document is that in the batch case
> timers are not fired appropriately.

+1. The underlying flaw is that processing time timers are not handled
correctly in batch, but should be (even if it means keeping workers
idle?). We should fix this.

> This is actually on of the
> motivations that led to introduction of @RequiresTimeSortedInput
> annotation and, though mentioned years ago as a question, I do not
> remember what arguments were used against enforcing sorting inputs by
> timestamp in the batch stateful DoFn as a requirement in the model. That
> would enable the appropriate firing of timers while preserving the batch
> invariant which is there are no late data allowed. IIRC there are
> runners that do this sorting by default (at least the sorting, not sure
> about the timers, but once inputs are sorted, firing timers is simple).
>
> A different question is if this particular transform should maybe fire
> not by event time, but rather processing time?

Yeah, I was reading all of these as processing time. Throttling by
event time doesn't make much sense.

> On 2/21/24 03:00, Robert Burke wrote:
> > Thanks for the design Damon! And thanks for collaborating with me on 
> > getting a high level textual description of the key implementation idea 
> > down in writing. I think the solution is pretty elegant.
> >
> > I do have concerns about how different Runners might handle 
> > ProcessContinuations for the Bounded Input case. I know Dataflow famously 
> > has two different execution modes under the hood, but I agree with the 
> > principle that ProcessContinuation.Resume should largely be in line with 
> > the expected delay, though it's by no means guaranteed AFAIK.
> >
> > We should also ensure this is linked from 
> > https://s.apache.org/beam-design-docs if not already.
> >
> > Robert Burke
> > Beam Go Busybody
> >
> > On 2024/02/20 14:00:00 Damon Douglas wrote:
> >> Hello Everyone,
> >>
> >> The following describes a Throttle PTransform that holds element throughput
> >> to minimize downstream API overusage. Thank you for reading and your
> >> valuable input.
> >>
> >> https://s.apache.org/beam-throttle-transform
> >>
> >> Best,
> >>
> >> Damon
> >>

Reply via email to