The case where a plan vanilla value or a windowed value is emitted seems as
expected as the user intent is honored without any surprises.

If I understand correctly in the case when timestamp is changed then
applying window function again can have unintended behavior in following
cases
* Custom windows: User code can be executed in unintended order.
* User emit a windowed value in a previous transform: Timestamping the
value in this case would overwrite the user assigned window in earlier step
even when the actual timestamp is the same. Semantically, emitting an
element or a timestamped value with the same timestamp should have the same
behaviour.

What do you think?


On Wed, Jan 15, 2020 at 4:04 PM Robert Bradshaw <rober...@google.com> wrote:

> If an element is emitted with a timestamp, the window assignment is
> re-applied at that time. At least that's how it is in Python. You can
> emit the full windowed value (accepted without checking...), a
> timestamped value (in which case the window will be computed), or a
> plain old element (in which case the window and timestamp will be
> computed (really, propagated)).
>
> On Wed, Jan 15, 2020 at 3:51 PM Ankur Goenka <goe...@google.com> wrote:
> >
> > Yup, This might result in unintended behavior as timestamp is changed
> after the window assignment as elements in windows do not have timestamp in
> the window time range.
> >
> > Shall we start validating atleast one window assignment between
> timestamp assignment and GBK/triggers to avoid unintended behaviors
> mentioned above?
> >
> > On Wed, Jan 15, 2020 at 1:24 PM Luke Cwik <lc...@google.com> wrote:
> >>
> >> Window assignment happens at the point in the pipeline the WindowInto
> transform was applied. So in this case the window would have been assigned
> using the original timestamp.
> >>
> >> Grouping is by key and window.
> >>
> >> On Tue, Jan 14, 2020 at 7:30 PM Ankur Goenka <goe...@google.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am not sure about the effect of the order of element timestamp
> change and window association has on a group by key.
> >>> More specifically, what would be the behavior if we apply window ->
> change element timestamp -> Group By key.
> >>> I think we should always apply window function after changing the
> timestamp of elements. Though this is neither checked nor a recommended
> practice in Beam.
> >>>
> >>> Example pipeline would look like this:
> >>>
> >>>       def applyTimestamp(value):
> >>>             return window.TimestampedValue((key, value),
> int(time.time())
> >>>
> >>>         p \
> >>>             | 'Create' >> beam.Create(range(0, 10)) \
> >>>             | 'Fixed Window' >>
> beam.WindowInto(window.FixedWindows(5)) \
> >>>             | 'Apply Timestamp' >> beam.Map(applyTimestamp) \ #
> Timestamp is changed after windowing and before GBK
> >>>             | 'Group By Key' >> beam.GroupByKey() \
> >>>             | 'Print' >> beam.Map(print)
> >>>
> >>> Thanks,
> >>> Ankur
>

Reply via email to