Thanks Feng, the feature of time travel sounds great!

In addition to SYSTEM_TIME, lake houses such as paimon and iceberg support
snapshot or version. For example, users can query snapshot 1 for paimon by
the following statement
SELECT * FROM t VERSION AS OF 1

Could we support this in Flink too?

Best,
Shammon FY

On Fri, May 26, 2023 at 1:20 PM Benchao Li <libenc...@apache.org> wrote:

> Regarding the implementation, did you consider the pushdown abilities
> compatible, e.g., projection pushdown, filter pushdown, partition pushdown.
> Since `Snapshot` is not handled much in existing rules, I have a concern
> about this. Of course, it depends on your implementation detail, what is
> important is that we'd better add some cross tests for these.
>
> Regarding the interface exposed to Connector, I see there is a rejected
> design for adding SupportsTimeTravel, but I didn't see the alternative in
> the FLIP doc. IMO, this is an important thing we need to clarify because we
> need to know whether the Connector supports this, and what column/metadata
> corresponds to 'system_time'.
>
> Feng Jin <jinfeng1...@gmail.com> 于2023年5月25日周四 22:50写道:
>
> > Thanks for your reply
> >
> > @Timo @BenChao @yuxia
> >
> > Sorry for the mistake,  Currently , calcite only supports  `FOR
> SYSTEM_TIME
> > AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS OF` .  I've
> > updated the syntax part of the FLIP.
> >
> >
> > @Timo
> >
> > > We will convert it to TIMESTAMP_LTZ?
> >
> > Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ and then
> convert
> > it into a long value.
> >
> > > How do we want to query the most recent version of a table
> >
> > I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does cause
> > inconsistency with the real-time concept.
> > However, from my personal understanding, the scope of  `AS OF
> > CURRENT_TIMESTAMP` is the table itself, not the table record.  So, I
> think
> > using CURRENT_TIMESTAMP should also be reasonable?.
> > Additionally, if no version is specified, the latest version should be
> used
> > by default.
> >
> >
> >
> > Best,
> > Feng
> >
> >
> > On Thu, May 25, 2023 at 7:47 PM yuxia <luoyu...@alumni.sjtu.edu.cn>
> wrote:
> >
> > > Thanks Feng for bringing this up. It'll be great to introduce time
> travel
> > > to Flink to have a better integration with external data soruces.
> > >
> > > I also share same concern about the syntax.
> > > I see in the part of `Whether to support other syntax implementations`
> in
> > > this FLIP, seems the syntax in Calcite should be `FOR SYSTEM_TIME AS
> OF`,
> > > right?
> > > But the the syntax part in this FLIP, it seems to be `AS OF TIMESTAMP`
> > > instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake or by design?
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > ----- 原始邮件 -----
> > > 发件人: "Benchao Li" <libenc...@apache.org>
> > > 收件人: "dev" <dev@flink.apache.org>
> > > 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >
> > > Thanks Feng, it's exciting to have this ability.
> > >
> > > Regarding the syntax section, are you proposing `AS OF` instead of `FOR
> > > SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in the SQL
> > standard
> > > and has been supported in some database vendors such as SQL Server.
> About
> > > `AS OF`, is it in the standard or any database vendor supports this, if
> > > yes, I think it's worth to add this support to Calcite, and I would
> give
> > a
> > > hand in Calcite side. Otherwise, I think we'd better to use `FOR SYSTEM
> > AS
> > > OF`.
> > >
> > > Timo Walther <twal...@apache.org> 于2023年5月25日周四 19:02写道:
> > >
> > > > Also: How do we want to query the most recent version of a table?
> > > >
> > > > `AS OF CURRENT_TIMESTAMP` would be ideal, but according to the docs
> > both
> > > > the type is TIMESTAMP_LTZ and what is even more concerning is the it
> > > > actually is evalated row-based:
> > > >
> > > >  > Returns the current SQL timestamp in the local time zone, the
> return
> > > > type is TIMESTAMP_LTZ(3). It is evaluated for each record in
> streaming
> > > > mode. But in batch mode, it is evaluated once as the query starts and
> > > > uses the same result for every row.
> > > >
> > > > This could make it difficult to explain in a join scenario of
> multiple
> > > > snapshotted tables.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 25.05.23 12:29, Timo Walther wrote:
> > > > > Hi Feng,
> > > > >
> > > > > thanks for proposing this FLIP. It makes a lot of sense to finally
> > > > > support querying tables at a specific point in time or hopefully
> also
> > > > > ranges soon. Following time-versioned tables.
> > > > >
> > > > > Here is some feedback from my side:
> > > > >
> > > > > 1. Syntax
> > > > >
> > > > > Can you elaborate a bit on the Calcite restrictions?
> > > > >
> > > > > Does Calcite currently support `AS OF` syntax for this but not `FOR
> > > > > SYSTEM_TIME AS OF`?
> > > > >
> > > > > It would be great to support `AS OF` also for time-versioned joins
> > and
> > > > > have a unified and short syntax.
> > > > >
> > > > > Once a fix is merged in Calcite for this, we can make this
> available
> > in
> > > > > Flink earlier by copying the corresponding classes until the next
> > > > > Calcite upgrade is performed.
> > > > >
> > > > > 2. Semantics
> > > > >
> > > > > How do we interpret the timestamp? In Flink we have 2 timestamp
> types
> > > > > (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF TIMESTAMP
> > > > > '2023-04-27 00:00:00', in which timezone will the timestamp be? We
> > will
> > > > > convert it to TIMESTAMP_LTZ?
> > > > >
> > > > > We definely need to clarify this because the past has shown that
> > > > > daylight saving times make our lives hard.
> > > > >
> > > > > Thanks,
> > > > > Timo
> > > > >
> > > > > On 25.05.23 10:57, Feng Jin wrote:
> > > > >> Hi, everyone.
> > > > >>
> > > > >> I’d like to start a discussion about FLIP-308: Support Time Travel
> > In
> > > > >> Batch
> > > > >> Mode [1]
> > > > >>
> > > > >>
> > > > >> Time travel is a SQL syntax used to query historical versions of
> > data.
> > > > It
> > > > >> allows users to specify a point in time and retrieve the data and
> > > > >> schema of
> > > > >> a table as it appeared at that time. With time travel, users can
> > > easily
> > > > >> analyze and compare historical versions of data.
> > > > >>
> > > > >>
> > > > >> With the widespread use of data lake systems such as Paimon,
> > Iceberg,
> > > > and
> > > > >> Hudi, time travel can provide more convenience for users' data
> > > analysis.
> > > > >>
> > > > >>
> > > > >> Looking forward to your opinions, any suggestions are welcomed.
> > > > >>
> > > > >>
> > > > >>
> > > > >> 1.
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> > > > >>
> > > > >>
> > > > >>
> > > > >> Best.
> > > > >>
> > > > >> Feng
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to