Thanks Jane for driving this exciting feature, looking forward to the vote!

Jane Chan <qingyue....@gmail.com> 于2023年10月23日周一 09:40写道:
>
> Hi, devs,
>
> Thanks for all the feedback.
>
> Based on the discussion [1], we seem to have a consensus, so I would like
> to start a vote. If you have any questions or concerns, please feel free to
> follow up on this discussion.
>
> [1] https://lists.apache.org/thread/3s69dhv3rp4s0kysnslqbvyqo3qf7zq5
>
>
> Best regards,
> Jane
>
> On Fri, Oct 13, 2023 at 5:36 PM Zakelly Lan <zakelly....@gmail.com> wrote:
>
> > Hi Jane,
> >
> > Actually, the first example that comes to mind is a simple group
> > aggregation, such as "SELECT /*+ STATE_TTL('1d') */ userid, SUM(price)
> > FROM orders GROUP BY userid;". However, as you mentioned, omitting the
> > key can introduce a lot of ambiguity, especially in nested query
> > blocks. I agree that we should not support this syntax.
> >
> >
> > Thanks a lot for clarifying this for me.
> >
> >
> > Best,
> > Zakelly
> >
> >
> > On Fri, Oct 13, 2023 at 4:59 PM Jane Chan <qingyue....@gmail.com> wrote:
> > >
> > > Hi Zakelly,
> > >
> > > What if an omitted key hint only applied for tables from the current
> > query
> > > > block?
> > > >
> > >
> > > Can you elaborate on some examples to illustrate? But if you're asking
> > > about the usage mentioned in [1], like " SELECT * FROM left /*+
> > STATE_TTL(
> > > '1h' ) */ JOIN right /*+ STATE_TTL('1d') */ ON ...; ", I'd like to
> > explain
> > > the reason why this syntax is not proposed in the following aspects:
> > >
> > > #1 According to the current hint syntax, hints that follow the
> > > SqlIdentifier are deemed as **DynamicTableOptions** [2]. However,
> > > "table.exec.state.ttl" is not a table option. If we want to support this
> > > approach, a separate thread may be required to discuss whether hints
> > > that follow an identifier should be limited to table options only.
> > >
> > > #2 On the other hand, according to the "Options Propagation" mentioned in
> > > FLIP-113 [3],
> > >
> > > > Let T be the table name that attaches hints:
> > >
> > > If T is a view, ignore the hints, we do not allow any query hints in the
> > > > table hints context, which is ambiguous because the view is actually a
> > > > query and itself can take a query hint;
> > >
> > >  Since the "left" and "right" (the example above) could be either a table
> > > name or a view name (alias to the query block). The hint key cannot be
> > > omitted if placed on the query block. So, from this point, having a
> > unified
> > > syntax that covers all situations is challenging.
> > >
> > > [1]
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#examples
> > > <
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#examples
> > >
> > > [2]
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options
> > > <
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options
> > >
> > > [3]
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
> > > <
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
> > >
> > >
> > > Best,
> > > Jane
> > >
> > > On Thu, Oct 12, 2023 at 7:02 PM Zakelly Lan <zakelly....@gmail.com>
> > wrote:
> > >
> > > > Thanks Jane for clarifying this. I'm ok with keeping the current hint
> > > > pattern with the table name specified by key. Just thinking if there
> > > > is another simpler way to define the hint. What if an omitted key hint
> > > > only applied for tables from the current query block?
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > Best,
> > > > Zakelly
> > > >
> > > > On Wed, Oct 11, 2023 at 11:25 PM Jane Chan <qingyue....@gmail.com>
> > wrote:
> > > > >
> > > > > Thank you all for the valuable feedback.
> > > > >
> > > > > There is a difference when users incorrectly use hints, and there
> > are two
> > > > > possible scenarios:
> > > > > <1> STATE_TTL hint can be applied to the current query block but
> > with an
> > > > > invalid key or value. In this case, the validation exception will be
> > > > > thrown. The HintOptionChecker will throw exceptions if the options
> > are
> > > > > empty or the value is an invalid duration. And JoinHintResolver will
> > > > throw
> > > > > exceptions if the hint key(table name/alias, etc.) does not exist. So
> > > > does
> > > > > for the group aggregate.
> > > > >
> > > > > <2> STATE_TTL hint cannot be applied to the current query block,
> > e.g.,
> > > > > SELECT /*+ STATE_TTL('MyTable' = '2h') */ * a, b, c FROM MyTable. In
> > this
> > > > > case, the hint is ignored. This is a by-design behavior according to
> > > > > FLIP-229 [1].
> > > > >
> > > > > I've made the modifications to the FLIP regarding exception
> > handling. I
> > > > > would appreciate it if you could review it again.
> > > > >
> > > > > @Xuyang
> > > > >
> > > > > >  I notice that using STATE_TTL hints wrongly will not throw any
> > > > > > exceptions. But it seems that in the current join hint scenario, if
> > > > user
> > > > > > uses an unknown table name as the chosen side, a validation
> > exception
> > > > will
> > > > > > be thrown. Maybe we should distinguish which exceptions need to be
> > > > thrown
> > > > > > explicitly.
> > > > >
> > > > >
> > > > > You're right; the STATE_TTL hint semantic check should throw
> > exceptions
> > > > > like join hints.
> > > > >
> > > > > @Feng @Yun
> > > > >
> > > > > > since there is no exception thrown when the STATE hint is set
> > > > incorrectly,
> > > > > > should we somehow show that the TTL setting has taken effect?
> > > > >
> > > > > For instance, exhibiting the TTL option within the operator's
> > > > description?
> > > > >
> > > > >
> > > > > We can throw explicit exceptions for scenario #1. For scenario #2, I
> > > > prefer
> > > > > to align the behavior for current query block hints for now (and we
> > may
> > > > > open a separate discussion in the future). On the other hand, from
> > the
> > > > > implementation aspect, it is not easy to do so. Taking the example of
> > > > > deduplication mentioned earlier, the hint is lost before it
> > propagates to
> > > > > the FlinkLogicalRank node, making it challenging to capture the
> > > > exception.
> > > > >
> > > > > @Zakelly
> > > > >
> > > > > > would it be possible to provide a simple hint that allows the
> > omission
> > > > of
> > > > > > the key? For example, something like "SELECT /+ STATE_TTL('1h')/",
> > > > which
> > > > > > would specify the TTL for all states in the 'SELECT' clause.
> > > > >
> > > > >
> > > > > I'm afraid the hint key cannot be omitted. E.g., the join operator
> > is a
> > > > > TwoInputStreamOperator; if we want to specify different state TTLs
> > for
> > > > the
> > > > > left and right input, we need to inform the planner of the ordinal
> > info
> > > > > (LEFT v.s. RIGHT). On the other hand, the SELECT clause may contain
> > > > several
> > > > > query blocks, while the hint scope is designed to apply to the
> > current
> > > > > query block [2] to prevent the hint on the outer query block from
> > > > > propagating to the inner query block. For the use case where users
> > need
> > > > to
> > > > > specify the TTL for all states in the 'SELECT' clause, it is
> > preferable
> > > > to
> > > > > modify the compiled plan instead of using hints.
> > > > >
> > > > > @ConradJam
> > > > > I share the same opinion with @Yun that this is a nice-to-have
> > feature.
> > > > Big
> > > > > +1 for the follow-up on FLINK-33230. Users can now use the COMPILE
> > PLAN
> > > > > statement to serialize the query to a JSON string or file and then
> > check
> > > > > the state metadata to ensure the hint is applied.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
> > > > > <
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
> > > > >
> > > > > [2]
> > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#query-hints
> > > > > <
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#query-hints
> > > > >
> > > > >
> > > > > Best,
> > > > > Jane
> > > > >
> > > > >
> > > > > On Wed, Oct 11, 2023 at 8:49 PM Yun Tang <myas...@live.com> wrote:
> > > > >
> > > > > > I think showing the TTL for operators is a nice-to-have feature,
> > not a
> > > > > > must one in this FLIP. We can still get the information from the
> > > > operator
> > > > > > descriptions.
> > > > > >
> > > > > > And I think we can continue the TTL showing work based on
> > FLINK-33230
> > > > [1].
> > > > > >
> > > > > > Last but not least, I prefer to throw exceptions if the TTL
> > > > configuration
> > > > > > is mistakenly used as it will affect the data correctness.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33230
> > > > > >
> > > > > > Best
> > > > > > Yun Tang
> > > > > > ________________________________
> > > > > > From: ConradJam <jam.gz...@gmail.com>
> > > > > > Sent: Wednesday, October 11, 2023 20:30
> > > > > > To: dev@flink.apache.org <dev@flink.apache.org>
> > > > > > Subject: Re: Re: [DISCUSS] FLIP-373: Support Configuring Different
> > > > State
> > > > > > TTLs using SQL Hint
> > > > > >
> > > > > > +1 TTL shows the state ttl for operators in Flink web ui can be
> > know
> > > > what
> > > > > > operator state
> > > > > >
> > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年10月11日周三 19:14写道:
> > > > > >
> > > > > > > Hi Jane,
> > > > > > >
> > > > > > > The fine-grained TTL management is extremely useful for
> > performance
> > > > > > > tuning, so +1 for the idea. I have a minor suggestion: would it
> > be
> > > > > > > possible to provide a simple hint that allows the omission of the
> > > > key?
> > > > > > > For example, something like "SELECT /+ STATE_TTL('1h')/", which
> > would
> > > > > > > specify the TTL for all states in the 'SELECT' clause.
> > > > > > >
> > > > > > > And I also share the same concern as Feng. I am wondering if we
> > could
> > > > > > > show the state ttl for operators in Flink UI.
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Zakelly
> > > > > > >
> > > > > > > On Wed, Oct 11, 2023 at 1:27 PM Feng Jin <jinfeng1...@gmail.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > Hi Jane,
> > > > > > > >
> > > > > > > > Thank you for providing this explanation.
> > > > > > > >
> > > > > > > > Another small question, since there is no exception thrown
> > when the
> > > > > > STATE
> > > > > > > > hint is set incorrectly,
> > > > > > > > should we somehow show that the TTL setting has taken effect?
> > > > > > > > For instance, exhibiting the TTL option within the operator's
> > > > > > > description?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Feng
> > > > > > > >
> > > > > > > > On Tue, Oct 10, 2023 at 7:51 PM Xuyang <xyzhong...@163.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Hi, Jane.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think this syntax will be easier for users to set operator
> > > > ttl. So
> > > > > > > big
> > > > > > > > > +1. I left some minor comments here.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I notice that using STATE_TTL hints wrongly will not throw
> > any
> > > > > > > exceptions.
> > > > > > > > > But it seems that in the current join hint scenario,
> > > > > > > > > if user uses an unknown table name as the chosen side, a
> > > > validation
> > > > > > > > > exception will be thrown.
> > > > > > > > > Maybe we should distinguish which exceptions need to be
> > thrown
> > > > > > > explicitly.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > >     Best!
> > > > > > > > >     Xuyang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > At 2023-10-10 18:23:55, "Jane Chan" <qingyue....@gmail.com>
> > > > wrote:
> > > > > > > > > >Hi Feng,
> > > > > > > > > >
> > > > > > > > > >Thank you for your valuable comments. The reason for not
> > > > including
> > > > > > the
> > > > > > > > > >scenarios above is as follows:
> > > > > > > > > >
> > > > > > > > > >For <1>, the automatically inferred stateful operators are
> > not
> > > > > > easily
> > > > > > > > > >expressible in SQL. This issue was discussed in FLIP-292,
> > and
> > > > > > besides
> > > > > > > > > >ChangelogNormalize, SinkUpsertMateralizer also faces the
> > same
> > > > > > problem.
> > > > > > > > > >
> > > > > > > > > >For <2> and <3>, the challenge lies in internal
> > implementation.
> > > > > > > During the
> > > > > > > > > >default_rewrite phase, the row_number expression in
> > > > LogicalProject
> > > > > > is
> > > > > > > > > >transformed into LogicalWindow by Calcite's
> > > > > > > > > >CoreRules#PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW. However,
> > > > > > > CalcRelSplitter
> > > > > > > > > >does not pass the hints as an input argument when creating
> > > > > > > LogicalWindow,
> > > > > > > > > >resulting in the loss of the hint at this step. To support
> > > > this, we
> > > > > > > may
> > > > > > > > > >need to rewrite some optimization rules in Calcite, which
> > could
> > > > be a
> > > > > > > > > >follow-up work if required.
> > > > > > > > > >
> > > > > > > > > >Best,
> > > > > > > > > >Jane
> > > > > > > > > >
> > > > > > > > > >On Tue, Oct 10, 2023 at 1:40 AM Feng Jin <
> > jinfeng1...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi Jane,
> > > > > > > > > >>
> > > > > > > > > >> Thank you for proposing this FLIP.
> > > > > > > > > >>
> > > > > > > > > >> I believe that this FLIP will greatly enhance the
> > flexibility
> > > > of
> > > > > > > setting
> > > > > > > > > >> state, and by setting different operators' TTL, it can
> > also
> > > > > > > increase job
> > > > > > > > > >> stability, especially in regular join scenarios.
> > > > > > > > > >> The parameter design is very concise, big +1 for this,
> > and it
> > > > is
> > > > > > > also
> > > > > > > > > >> relatively easy to use for users.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> I have a small question: in the FLIP, it only mentions
> > join
> > > > and
> > > > > > > group.
> > > > > > > > > >> Should we also consider other scenarios?
> > > > > > > > > >>
> > > > > > > > > >> 1. the auto generated deduplicate operator[1].
> > > > > > > > > >> 2. the deduplicate query[2].
> > > > > > > > > >> 3. the topN query[3].
> > > > > > > > > >>
> > > > > > > > > >> [1]
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-cdc-events-duplicate
> > > > > > > > > >> [2]
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/deduplication/
> > > > > > > > > >> [3]
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/topn/
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Feng
> > > > > > > > > >>
> > > > > > > > > >> On Sun, Oct 8, 2023 at 5:53 PM Jane Chan <
> > > > qingyue....@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi devs,
> > > > > > > > > >> >
> > > > > > > > > >> > I'd like to initiate a discussion on FLIP-373: Support
> > > > > > Configuring
> > > > > > > > > >> > Different State TTLs using SQL Hint [1]. This proposal
> > is
> > > > on top
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > >> > FLIP-292 [2] to address typical scenarios with
> > unambiguous
> > > > > > > semantics
> > > > > > > > > and
> > > > > > > > > >> > hint propagation.
> > > > > > > > > >> >
> > > > > > > > > >> > I'm looking forward to your opinions!
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > [1]
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint
> > > > > > > > > >> > [2]
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration
> > > > > > > > > >> >
> > > > > > > > > >> > Best,
> > > > > > > > > >> > Jane
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best
> > > > > >
> > > > > > ConradJam
> > > > > >
> > > >
> >



-- 

Best,
Benchao Li

Reply via email to