Hi all,

Thanks for all discussions!

Anyone else have questions or suggestions? if not, I will start a vote
thread later.

Best
Kui Yuan

kui yuan <catye...@gmail.com> 于2023年2月27日周一 20:21写道:

> Hi Timo,
>
> Thanks for your advice. I totally agree with your suggestion of naming
> convention, I will rename these options and update the flip later, thanks
> very much.
>
> In our internal implementation we had put these options inside the
> `FactoryUtil`, just as you expect.  We have also taken into account the
> changes to the CompiledPlan and we have packaged these options
> appropriately to minimize intrusiveness and ensure the compatibility to the
> `WatermarkPushDownSpec`.
>
> > A hint to the implementation: I would suggest that we add those options
> > to `FactoryUtil`. All cross-connector options should end up there.
>
>
> > Please also consider the changes to the CompiledPlan in your FLIP. This
> > change has implications on the JSON format as watermark strategy of
> > ExecNode becomes more complex, see WatermarkPushDownSpec
>
> Best
> Kui Yuan
>
> Timo Walther <twal...@apache.org> 于2023年2月27日周一 18:05写道:
>
>> Hi Kui Yuan,
>>
>> thanks for working on this FLIP. Let me also give some comments about
>> the proposed changes.
>>
>> I support the direction of this FLIP about handling these
>> watermark-specific properties through options and /*+OPTIONS(...) */
>> hints.
>>
>> Regarding naming, I would like to keep the options in sync with existing
>> options:
>>
>>  > 'watermark.emit.strategy'='ON_EVENT'
>>
>> Let's use lower case (e.g. `on-event`) that matches with properties like
>> sink.partitioner [1] or sink.delivery-guarantee [2].
>>
>>  > 'source.idle-timeout'='1min'
>>
>> According to FLIP-122 [3], we want to prefix all scan-source related
>> properties with `scan.*`. This clearly includes idle-timeout and
>> actually also watermark strategies which don't apply for lookup sources.
>>
>> Summarizing the comments above, we should use the following options:
>>
>> 'scan.watermark.emit.strategy'='on-event',
>> 'scan.watermark.emit.on-event.gap'='10000',
>> 'scan.idle-timeout'='1min',
>> 'scan.watermark.alignment.group'='alignment-group-1',
>> 'scan.watermark.alignment.max-drift'='1min',
>> 'scan.watermark.alignment.update-interval'='1s'
>>
>> I know that this makes the keys even longer, but given that those
>> options are for power users this should be acceptable. It also clearly
>> indicates which options are for sinks, scans, and lookups. This
>> potentially also helps in allow lists.
>>
>> A hint to the implementation: I would suggest that we add those options
>> to `FactoryUtil`. All cross-connector options should end up there.
>>
>> Please also consider the changes to the CompiledPlan in your FLIP. This
>> change has implications on the JSON format as watermark strategy of
>> ExecNode becomes more complex, see WatermarkPushDownSpec [4].
>>
>> Regards,
>> Timo
>>
>>
>> [1]
>>
>> https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner
>> [2]
>>
>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee
>> [3]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
>> [4]
>>
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java
>>
>>
>> On 24.02.23 04:55, kui yuan wrote:
>> > Hi all,
>> >
>> > I have updated the flip according to the discussion, and we will extend
>> the
>> > watermark-related features with both table options and 'OPTIONS' hint,
>> like
>> > this:
>> >
>> > ```
>> > -- configure in table options
>> > CREATE TABLE user_actions (
>> >    ...
>> >    user_action_time TIMESTAMP(3),
>> >    WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5'
>> SECOND
>> > ) WITH (
>> >    'watermark.emit.strategy'='ON_PERIODIC',
>> >    ...
>> > );
>> >
>> > -- use 'OPTIONS' hint
>> > select ... from source_table /*+ OPTIONS('watermark.emit.strategy'=
>> > 'ON_PERIODIC') */
>> > ```
>> >
>> > Does everybody have any other questions?
>> >
>> > Best
>> > Kui Yuan
>> >
>> > kui yuan <catye...@gmail.com> 于2023年2月23日周四 20:05写道:
>> >
>> >> Hi all,
>> >>
>> >> Thanks for all suggestions.
>> >>
>> >> We will extend the watermark-related features in SQL layer with dynamic
>> >> table options and 'OPTIONS' hint, just as everyone expects. I will
>> modify
>> >> Flip-296 as discussed.
>> >>
>> >> @Martijn As far as I know, there is no hint interface in the table API,
>> >> so we can't use hint in table API directly. if we need to extend the
>> hint
>> >> interface in the table API, maybe we need another flip. However, if we
>> >> extend the watermark-related features in the dynamic table options,
>> maybe
>> >> we are able to use them indirectly in the table API like this[1]:
>> >>
>> >> ```
>> >> // register a table named "Orders"
>> >> tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product
>> STRING,
>> >> amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)");
>> >> ```
>> >>
>> >> [1]
>> >>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/
>> >>
>> >> Best
>> >> Kui Yuan
>> >>
>> >> Yun Tang <myas...@live.com> 于2023年2月23日周四 17:46写道:
>> >>
>> >>> Thanks for the warm discussions!
>> >>>
>> >>> I had an offline discussion with Kui about the replies. I think I
>> could
>> >>> give some explanations on the original intention to introduce another
>> >>> WATERMARK_PARAMS. If we take a look at the current datastream API, the
>> >>> watermark strategy does not belong to any specific connector. And we
>> >>> thought the dynamic table options were more like the configurations
>> within
>> >>> some specific connector.
>> >>>
>> >>>  From the review comments, I think most people feel good to make it
>> part
>> >>> of the dynamic table options. I think this is fine if we give more
>> clear
>> >>> scope definition of the dynamic table options here. And I also agree
>> with
>> >>> Jingsong's concern about adding SQL syntax which is the most
>> concerning
>> >>> part before launching this discussion.
>> >>>
>> >>> For Martijn's concern, if we accept to make the watermark-related
>> options
>> >>> part of dynamic table options, the problem becomes another topic: how
>> to
>> >>> support the dynamic table options in table API, which is deserved to
>> create
>> >>> another FLIP.
>> >>>
>> >>> Best
>> >>> Yun Tang
>> >>> ________________________________
>> >>> From: Martijn Visser <martijnvis...@apache.org>
>> >>> Sent: Thursday, February 23, 2023 17:14
>> >>> To: dev@flink.apache.org <dev@flink.apache.org>
>> >>> Subject: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
>> >>>
>> >>> Hi,
>> >>>
>> >>> While I can understand that there's a desire to first focus on solving
>> >>> this
>> >>> problem for SQL, I do wonder if we should ignore the Table API at this
>> >>> point. If we could include the syntax for the Table API, it
>> potentially
>> >>> could also be implemented by another contributor without needing to
>> create
>> >>> another FLIP. If we don't design it right now, my concern is that this
>> >>> will
>> >>> increase sparsity for the Table API which ultimately hurts adoption.
>> >>>
>> >>> With regards to the syntax, I have a preference to solve this via the
>> >>> connector options (e.g. like you can currently specify things as
>> >>> scan.startup.specific-offsets or scan.bounded.mode for the Kafka
>> >>> connector). You could still use the dynamic table options to
>> override/add
>> >>> them.
>> >>>
>> >>> Best regards,
>> >>>
>> >>> Martijn
>> >>>
>> >>> On Thu, Feb 23, 2023 at 7:21 AM Shammon FY <zjur...@gmail.com> wrote:
>> >>>
>> >>>> Hi kui
>> >>>>
>> >>>> Thanks for your answer and +1 to yuxia too
>> >>>>
>> >>>>> we should not bind the watermark-related options to a connector to
>> >>> ensure
>> >>>> semantic clarity.
>> >>>>
>> >>>> In my opinion, adding watermark-related options to a connector is
>> much
>> >>> more
>> >>>> clear. Currently users can define simple watermark strategy in DDL,
>> >>> adding
>> >>>> more configuration items in connector options is easy to understand
>> >>>>
>> >>>> Best,
>> >>>> Shammon
>> >>>>
>> >>>>
>> >>>> On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li <jingsongl...@gmail.com
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> Thanks for your proposal.
>> >>>>>
>> >>>>> +1 to yuxia, consider watermark-related hints as option hints.
>> >>>>>
>> >>>>> Personally, I am cautious about adding SQL syntax, WATERMARK_PARAMS
>> is
>> >>>>> also SQL syntax to some extent.
>> >>>>>
>> >>>>> We can use OPTIONS to meet this requirement if possible.
>> >>>>>
>> >>>>> Best,
>> >>>>> Jingsong
>> >>>>>
>> >>>>> On Thu, Feb 23, 2023 at 10:41 AM yuxia <luoyu...@alumni.sjtu.edu.cn
>> >
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hi, Yuan Kui.
>> >>>>>> Thanks for driving it.
>> >>>>>> IMO, the 'OPTIONS' hint may be not only specific to the connector
>> >>>>> options. Just as a reference, we also have `sink.parallelism`[1] as
>> a
>> >>>>> connector options. It enables
>> >>>>>> user to specific the writer's parallelism dynamically per-query.
>> >>>>>>
>> >>>>>> Personally, I perfer to consider watermark-related hints as option
>> >>>>> hints. So, user can define a default watermark strategy for the
>> table,
>> >>>> and
>> >>>>> if user dosen't needed to changes it, they need to do nothing in
>> their
>> >>>>> query instead of specific it ervery time.
>> >>>>>>
>> >>>>>> [1]
>> >>>>>
>> >>>>
>> >>>
>> https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Yuxia
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Yuxia
>> >>>>>>
>> >>>>>> ----- 原始邮件 -----
>> >>>>>> 发件人: "kui yuan" <catye...@gmail.com>
>> >>>>>> 收件人: "dev" <dev@flink.apache.org>
>> >>>>>> 抄送: "Jark Wu" <imj...@gmail.com>
>> >>>>>> 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11
>> >>>>>> 主题: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL
>> >>>>>>
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> Thanks for the lively discussion and I will respond to these
>> >>> questions
>> >>>>> one
>> >>>>>> by one. However, there are also some common questions and I will
>> >>> answer
>> >>>>>> together.
>> >>>>>>
>> >>>>>> @郑 Thanks for your reply. The features mentioned in this flip are
>> >>> only
>> >>>>> for
>> >>>>>> those source connectors that implement the
>> SupportsWatermarkPushDown
>> >>>>>> interface, generating watermarks in other graph locations is not in
>> >>> the
>> >>>>>> scope of this discussion. Perhaps another flip can be proposed
>> >>> later to
>> >>>>>> implement this feature.
>> >>>>>>
>> >>>>>> @Shammon Thanks for your reply. In Flip-296, a rejected alternative
>> >>> is
>> >>>>>> adding watermark related options in the connector options,we
>> believe
>> >>>> that
>> >>>>>> we should not bind the watermark-related options to a connector to
>> >>>> ensure
>> >>>>>> semantic clarity.
>> >>>>>>
>> >>>>>>> What will happen if we add watermark related options in `the
>> >>>> connector
>> >>>>>>> options`? Will the connector ignore these options or throw an
>> >>>>> exception?
>> >>>>>>> How can we support this?
>> >>>>>>
>> >>>>>> If user defines different watermark configurations for one table in
>> >>> two
>> >>>>>> places, I tend to prefer the first place would prevail, but  we can
>> >>>> also
>> >>>>>> throw exception or just print logs to prompt the user, which are
>> >>>>>> implementation details.
>> >>>>>>
>> >>>>>>> If one table is used by two operators with different watermark
>> >>>> params,
>> >>>>>>> what will happen?
>> >>>>>>
>> >>>>>> @Martijn Thanks for your reply.  I'm sorry that we are not
>> >>> particularly
>> >>>>>> accurate, this hint is mainly for SQL,  not table API.
>> >>>>>>
>> >>>>>>> While the FLIP talks about watermark options for Table API & SQL,
>> >>> I
>> >>>>> only
>> >>>>>>> see proposed syntax for SQL, not for the Table API. What is your
>> >>>>> proposal
>> >>>>>>> for the Table API
>> >>>>>>
>> >>>>>> @Jane Thanks for your reply. For the first question, If the user
>> >>> uses
>> >>>>> this
>> >>>>>> hint on those sourse that does not implement the
>> >>>>> SupportsWatermarkPushDown
>> >>>>>> interface, it will be completely invalid. The task will run as
>> >>> normal
>> >>>> as
>> >>>>> if
>> >>>>>> the hint had not been used.
>> >>>>>>
>> >>>>>>> What's the behavior if there are multiple table sources, among
>> >>> which
>> >>>>>>> some do not support `SupportsWatermarkPushDown`?
>> >>>>>>
>> >>>>>> @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember,
>> >>>> perhaps
>> >>>>>> the naming issue can be put to the end of the discussion, because
>> >>> more
>> >>>>>> people like @Martijn @Shuo are considering whether these
>> >>> configurations
>> >>>>>> should be put into the DDL or the 'OPTIONS' hint. Here's what I
>> >>>>>> think, Putting these configs into DDL or putting them into
>> 'OPTIONS'
>> >>>> hint
>> >>>>>> is actually the same thing, because the 'OPTIONS' hint is mainly
>> >>> used
>> >>>> to
>> >>>>>> configure the properties of conenctor. The reason why I want to use
>> >>> a
>> >>>> new
>> >>>>>> hint is to make sure the semantics clear, in my opinion the
>> >>>> configuration
>> >>>>>> of watermark should not be mixed up with connector. However, a new
>> >>> hint
>> >>>>>> does make it more difficult to use to some extent, for example,
>> >>> when a
>> >>>>> user
>> >>>>>> uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this
>> >>> point,
>> >>>>> maby
>> >>>>>> it is more appropriate to use uniform 'OPTIONS' hint.
>> >>>>>> On the other hand, if we enrich more watermark option keys in
>> >>> 'OPTIONS'
>> >>>>>> hints, The question will be what we treat the definatrions of
>> >>>> 'OPTIONS'
>> >>>>>> hint, is this only specific to the connector options or could be
>> >>> more?
>> >>>>>> Maybe @Jark could share more insights here. In my opion, 'OPTIONS'
>> >>> is
>> >>>>> only
>> >>>>>> related to the connector options, which is not like the gernal
>> >>>> watermark
>> >>>>>> options.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Shuo Cheng <njucs...@gmail.com> 于2023年2月22日周三 19:17写道:
>> >>>>>>
>> >>>>>>> Hi Kui,
>> >>>>>>>
>> >>>>>>> Thanks for driving the discussion. It's quite useful to introduce
>> >>>>> Watermark
>> >>>>>>> options. I have some questions:
>> >>>>>>>
>> >>>>>>> What kind of hints is "WATERMARK_PARAMS"?
>> >>>>>>> Currently, we have two kinds of hints in Flink: Dynamic Table
>> >>>> Options &
>> >>>>>>> Query Hints. As described in the Flip, "WATERMARK_PARAMS" is more
>> >>>> like
>> >>>>>>> Dynamic Table Options. So two questions arise here:
>> >>>>>>>
>> >>>>>>> 1)  Are these watermark options to be exposed as connector WITH
>> >>>>> options? Aa
>> >>>>>>> described in SQL Hints doc[1],  "Dynamic Table Options allow to
>> >>>>> specify or
>> >>>>>>> override table options dynamically", which implies that these
>> >>> options
>> >>>>> can
>> >>>>>>> also be configured in WITH options.
>> >>>>>>>
>> >>>>>>> 2)  Do we really need a new hint name like 'WATERMARK_PARAMS',
>> >>> table
>> >>>>>>> options use "OPTIONS" as hint name, like '/*+
>> >>>>>>> OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can enrich
>> >>>> more
>> >>>>>>> table option keys for watermark, e.g., /*+
>> >>>>>>> OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> [1]
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/
>> >>>>>>>
>> >>>>>>> On Wed, Feb 22, 2023 at 10:22 AM kui yuan <catye...@gmail.com>
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi devs,
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> I'd like to start a discussion thread for FLIP-296[1]. This
>> >>> comes
>> >>>>> from an
>> >>>>>>>> offline discussion with @Yun Tang, and we hope to enrich table
>> >>> API
>> >>>> &
>> >>>>> SQL
>> >>>>>>> to
>> >>>>>>>> support many watermark-related features which were only
>> >>> implemented
>> >>>>> at
>> >>>>>>> the
>> >>>>>>>> datastream API level.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Basically, we want to introduce watermark options in table API &
>> >>>> SQL
>> >>>>> via
>> >>>>>>>> SQL hint named 'WATERMARK_PARAMS' to support features:
>> >>>>>>>>
>> >>>>>>>> 1、Configurable watermark emit strategy
>> >>>>>>>>
>> >>>>>>>> 2、Dealing with idle sources
>> >>>>>>>>
>> >>>>>>>> 3、Watermark alignment
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Last but not least, thanks to Qingsheng and Jing Zhang for the
>> >>>>> initial
>> >>>>>>>> reviews.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Looking forward to your thoughts and any feedback is
>> >>> appreciated!
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> [1]
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Best
>> >>>>>>>>
>> >>>>>>>> Yuan Kui
>> >>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>

Reply via email to