Thanks Kurt and Jark for the detailed explanation! Pretty much helped to
understand about FLIP-66.

That sounds as Flink won't leverage timestamp in StreamRecord (which is
hidden and cannot modified easily) and handles the time semantic by the
input schema for the operation, to unify the semantic between batch and
stream. Did I understand it correctly?

I'm not familiar with internal of Flink so not easy to consume the
information in FLINK-11286, but in general I'd be supportive with defining
watermark as close as possible from source, as it'll be easier to reason
about. (I basically refer to timestamp assigner instead of watermark
assigner though.)

- Jungtaek Lim

On Tue, Apr 28, 2020 at 11:37 AM Jark Wu <imj...@gmail.com> wrote:

> Hi Jungtaek,
>
> Kurt has said what I want to say. I will add some background.
> Flink Table API & SQL only supports to define processing-time attribute and
> event-time attribute (watermark) on source, not support to define a new one
> in query.
> The time attributes will pass through the query and time-based operations
> can only apply on the time attributes.
>
> The reason why Flink Table & SQL only supports to define watermark on
> source is that this can allow us to do per-partition watermark, source idle
> and simplify things.
> There are also some discussion about "disable arbitrary watermark assigners
> in the middle of a pipeline in DataStream" in this JIRA issue comments.
>
> Best,
> Jark
>
> [1]: https://issues.apache.org/jira/browse/FLINK-11286
>
>
> On Tue, 28 Apr 2020 at 09:28, Kurt Young <ykt...@gmail.com> wrote:
>
> > The current behavior is later. Flink gets time attribute column from
> source
> > table, and tries to analyze and keep
> > the time attribute column as much as possible, e.g. simple projection or
> > filter which doesn't effect the column
> > will keep the time attribute, window aggregate will generate its own time
> > attribute if you select window_start or
> > window_end. But you're right, sometimes framework will loose the
> > information about time attribute column, and
> > after that, some operations will throw exception.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Apr 28, 2020 at 7:45 AM Jungtaek Lim <
> kabhwan.opensou...@gmail.com
> > >
> > wrote:
> >
> > > Hi devs,
> > >
> > > I'm interesting about the new change on FLIP-66 [1], because if I
> > > understand correctly, Flink hasn't been having event-time timestamp
> field
> > > (column) as a part of "normal" schema, and FLIP-66 tries to change it.
> > >
> > > That sounds as the column may be open for modification, like rename
> > (alias)
> > > or some other operations, or even be dropped via projection. Will such
> > > operations affect event-time timestamp for the record? If you have an
> > idea
> > > about how Spark Structured Streaming works with watermark then you
> might
> > > catch the point.
> > >
> > > Maybe the question could be reworded as, does the definition of event
> > time
> > > timestamp column on DDL only project to the source definition, or it
> will
> > > carry over the entire query and let operator determine such column as
> > > event-time timestamp. (SSS works as latter.) I think this is a huge
> > > difference, as for me it's like stability vs flexibility, and there're
> > > drawbacks on latter (there're also drawbacks on former as well, but
> > > computed column may cover up).
> > >
> > > Thanks in advance!
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > 1.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+Time+Attribute+in+SQL+DDL
> > >
> >
>

Reply via email to