Hi all, Thanks for all discussions!
Anyone else have questions or suggestions? if not, I will start a vote thread later. Best Kui Yuan kui yuan <catye...@gmail.com> 于2023年2月27日周一 20:21写道: > Hi Timo, > > Thanks for your advice. I totally agree with your suggestion of naming > convention, I will rename these options and update the flip later, thanks > very much. > > In our internal implementation we had put these options inside the > `FactoryUtil`, just as you expect. We have also taken into account the > changes to the CompiledPlan and we have packaged these options > appropriately to minimize intrusiveness and ensure the compatibility to the > `WatermarkPushDownSpec`. > > > A hint to the implementation: I would suggest that we add those options > > to `FactoryUtil`. All cross-connector options should end up there. > > > > Please also consider the changes to the CompiledPlan in your FLIP. This > > change has implications on the JSON format as watermark strategy of > > ExecNode becomes more complex, see WatermarkPushDownSpec > > Best > Kui Yuan > > Timo Walther <twal...@apache.org> 于2023年2月27日周一 18:05写道: > >> Hi Kui Yuan, >> >> thanks for working on this FLIP. Let me also give some comments about >> the proposed changes. >> >> I support the direction of this FLIP about handling these >> watermark-specific properties through options and /*+OPTIONS(...) */ >> hints. >> >> Regarding naming, I would like to keep the options in sync with existing >> options: >> >> > 'watermark.emit.strategy'='ON_EVENT' >> >> Let's use lower case (e.g. `on-event`) that matches with properties like >> sink.partitioner [1] or sink.delivery-guarantee [2]. >> >> > 'source.idle-timeout'='1min' >> >> According to FLIP-122 [3], we want to prefix all scan-source related >> properties with `scan.*`. This clearly includes idle-timeout and >> actually also watermark strategies which don't apply for lookup sources. >> >> Summarizing the comments above, we should use the following options: >> >> 'scan.watermark.emit.strategy'='on-event', >> 'scan.watermark.emit.on-event.gap'='10000', >> 'scan.idle-timeout'='1min', >> 'scan.watermark.alignment.group'='alignment-group-1', >> 'scan.watermark.alignment.max-drift'='1min', >> 'scan.watermark.alignment.update-interval'='1s' >> >> I know that this makes the keys even longer, but given that those >> options are for power users this should be acceptable. It also clearly >> indicates which options are for sinks, scans, and lookups. This >> potentially also helps in allow lists. >> >> A hint to the implementation: I would suggest that we add those options >> to `FactoryUtil`. All cross-connector options should end up there. >> >> Please also consider the changes to the CompiledPlan in your FLIP. This >> change has implications on the JSON format as watermark strategy of >> ExecNode becomes more complex, see WatermarkPushDownSpec [4]. >> >> Regards, >> Timo >> >> >> [1] >> >> https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner >> [2] >> >> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee >> [3] >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory >> [4] >> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java >> >> >> On 24.02.23 04:55, kui yuan wrote: >> > Hi all, >> > >> > I have updated the flip according to the discussion, and we will extend >> the >> > watermark-related features with both table options and 'OPTIONS' hint, >> like >> > this: >> > >> > ``` >> > -- configure in table options >> > CREATE TABLE user_actions ( >> > ... >> > user_action_time TIMESTAMP(3), >> > WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' >> SECOND >> > ) WITH ( >> > 'watermark.emit.strategy'='ON_PERIODIC', >> > ... >> > ); >> > >> > -- use 'OPTIONS' hint >> > select ... from source_table /*+ OPTIONS('watermark.emit.strategy'= >> > 'ON_PERIODIC') */ >> > ``` >> > >> > Does everybody have any other questions? >> > >> > Best >> > Kui Yuan >> > >> > kui yuan <catye...@gmail.com> 于2023年2月23日周四 20:05写道: >> > >> >> Hi all, >> >> >> >> Thanks for all suggestions. >> >> >> >> We will extend the watermark-related features in SQL layer with dynamic >> >> table options and 'OPTIONS' hint, just as everyone expects. I will >> modify >> >> Flip-296 as discussed. >> >> >> >> @Martijn As far as I know, there is no hint interface in the table API, >> >> so we can't use hint in table API directly. if we need to extend the >> hint >> >> interface in the table API, maybe we need another flip. However, if we >> >> extend the watermark-related features in the dynamic table options, >> maybe >> >> we are able to use them indirectly in the table API like this[1]: >> >> >> >> ``` >> >> // register a table named "Orders" >> >> tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product >> STRING, >> >> amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)"); >> >> ``` >> >> >> >> [1] >> >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/ >> >> >> >> Best >> >> Kui Yuan >> >> >> >> Yun Tang <myas...@live.com> 于2023年2月23日周四 17:46写道: >> >> >> >>> Thanks for the warm discussions! >> >>> >> >>> I had an offline discussion with Kui about the replies. I think I >> could >> >>> give some explanations on the original intention to introduce another >> >>> WATERMARK_PARAMS. If we take a look at the current datastream API, the >> >>> watermark strategy does not belong to any specific connector. And we >> >>> thought the dynamic table options were more like the configurations >> within >> >>> some specific connector. >> >>> >> >>> From the review comments, I think most people feel good to make it >> part >> >>> of the dynamic table options. I think this is fine if we give more >> clear >> >>> scope definition of the dynamic table options here. And I also agree >> with >> >>> Jingsong's concern about adding SQL syntax which is the most >> concerning >> >>> part before launching this discussion. >> >>> >> >>> For Martijn's concern, if we accept to make the watermark-related >> options >> >>> part of dynamic table options, the problem becomes another topic: how >> to >> >>> support the dynamic table options in table API, which is deserved to >> create >> >>> another FLIP. >> >>> >> >>> Best >> >>> Yun Tang >> >>> ________________________________ >> >>> From: Martijn Visser <martijnvis...@apache.org> >> >>> Sent: Thursday, February 23, 2023 17:14 >> >>> To: dev@flink.apache.org <dev@flink.apache.org> >> >>> Subject: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL >> >>> >> >>> Hi, >> >>> >> >>> While I can understand that there's a desire to first focus on solving >> >>> this >> >>> problem for SQL, I do wonder if we should ignore the Table API at this >> >>> point. If we could include the syntax for the Table API, it >> potentially >> >>> could also be implemented by another contributor without needing to >> create >> >>> another FLIP. If we don't design it right now, my concern is that this >> >>> will >> >>> increase sparsity for the Table API which ultimately hurts adoption. >> >>> >> >>> With regards to the syntax, I have a preference to solve this via the >> >>> connector options (e.g. like you can currently specify things as >> >>> scan.startup.specific-offsets or scan.bounded.mode for the Kafka >> >>> connector). You could still use the dynamic table options to >> override/add >> >>> them. >> >>> >> >>> Best regards, >> >>> >> >>> Martijn >> >>> >> >>> On Thu, Feb 23, 2023 at 7:21 AM Shammon FY <zjur...@gmail.com> wrote: >> >>> >> >>>> Hi kui >> >>>> >> >>>> Thanks for your answer and +1 to yuxia too >> >>>> >> >>>>> we should not bind the watermark-related options to a connector to >> >>> ensure >> >>>> semantic clarity. >> >>>> >> >>>> In my opinion, adding watermark-related options to a connector is >> much >> >>> more >> >>>> clear. Currently users can define simple watermark strategy in DDL, >> >>> adding >> >>>> more configuration items in connector options is easy to understand >> >>>> >> >>>> Best, >> >>>> Shammon >> >>>> >> >>>> >> >>>> On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li <jingsongl...@gmail.com >> > >> >>>> wrote: >> >>>> >> >>>>> Thanks for your proposal. >> >>>>> >> >>>>> +1 to yuxia, consider watermark-related hints as option hints. >> >>>>> >> >>>>> Personally, I am cautious about adding SQL syntax, WATERMARK_PARAMS >> is >> >>>>> also SQL syntax to some extent. >> >>>>> >> >>>>> We can use OPTIONS to meet this requirement if possible. >> >>>>> >> >>>>> Best, >> >>>>> Jingsong >> >>>>> >> >>>>> On Thu, Feb 23, 2023 at 10:41 AM yuxia <luoyu...@alumni.sjtu.edu.cn >> > >> >>>>> wrote: >> >>>>>> >> >>>>>> Hi, Yuan Kui. >> >>>>>> Thanks for driving it. >> >>>>>> IMO, the 'OPTIONS' hint may be not only specific to the connector >> >>>>> options. Just as a reference, we also have `sink.parallelism`[1] as >> a >> >>>>> connector options. It enables >> >>>>>> user to specific the writer's parallelism dynamically per-query. >> >>>>>> >> >>>>>> Personally, I perfer to consider watermark-related hints as option >> >>>>> hints. So, user can define a default watermark strategy for the >> table, >> >>>> and >> >>>>> if user dosen't needed to changes it, they need to do nothing in >> their >> >>>>> query instead of specific it ervery time. >> >>>>>> >> >>>>>> [1] >> >>>>> >> >>>> >> >>> >> https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism >> >>>>>> >> >>>>>> Best regards, >> >>>>>> Yuxia >> >>>>>> >> >>>>>> Best regards, >> >>>>>> Yuxia >> >>>>>> >> >>>>>> ----- 原始邮件 ----- >> >>>>>> 发件人: "kui yuan" <catye...@gmail.com> >> >>>>>> 收件人: "dev" <dev@flink.apache.org> >> >>>>>> 抄送: "Jark Wu" <imj...@gmail.com> >> >>>>>> 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 >> >>>>>> 主题: Re: [DISCUSS] FLIP-296: Watermark options for table API & SQL >> >>>>>> >> >>>>>> Hi all, >> >>>>>> >> >>>>>> Thanks for the lively discussion and I will respond to these >> >>> questions >> >>>>> one >> >>>>>> by one. However, there are also some common questions and I will >> >>> answer >> >>>>>> together. >> >>>>>> >> >>>>>> @郑 Thanks for your reply. The features mentioned in this flip are >> >>> only >> >>>>> for >> >>>>>> those source connectors that implement the >> SupportsWatermarkPushDown >> >>>>>> interface, generating watermarks in other graph locations is not in >> >>> the >> >>>>>> scope of this discussion. Perhaps another flip can be proposed >> >>> later to >> >>>>>> implement this feature. >> >>>>>> >> >>>>>> @Shammon Thanks for your reply. In Flip-296, a rejected alternative >> >>> is >> >>>>>> adding watermark related options in the connector options,we >> believe >> >>>> that >> >>>>>> we should not bind the watermark-related options to a connector to >> >>>> ensure >> >>>>>> semantic clarity. >> >>>>>> >> >>>>>>> What will happen if we add watermark related options in `the >> >>>> connector >> >>>>>>> options`? Will the connector ignore these options or throw an >> >>>>> exception? >> >>>>>>> How can we support this? >> >>>>>> >> >>>>>> If user defines different watermark configurations for one table in >> >>> two >> >>>>>> places, I tend to prefer the first place would prevail, but we can >> >>>> also >> >>>>>> throw exception or just print logs to prompt the user, which are >> >>>>>> implementation details. >> >>>>>> >> >>>>>>> If one table is used by two operators with different watermark >> >>>> params, >> >>>>>>> what will happen? >> >>>>>> >> >>>>>> @Martijn Thanks for your reply. I'm sorry that we are not >> >>> particularly >> >>>>>> accurate, this hint is mainly for SQL, not table API. >> >>>>>> >> >>>>>>> While the FLIP talks about watermark options for Table API & SQL, >> >>> I >> >>>>> only >> >>>>>>> see proposed syntax for SQL, not for the Table API. What is your >> >>>>> proposal >> >>>>>>> for the Table API >> >>>>>> >> >>>>>> @Jane Thanks for your reply. For the first question, If the user >> >>> uses >> >>>>> this >> >>>>>> hint on those sourse that does not implement the >> >>>>> SupportsWatermarkPushDown >> >>>>>> interface, it will be completely invalid. The task will run as >> >>> normal >> >>>> as >> >>>>> if >> >>>>>> the hint had not been used. >> >>>>>> >> >>>>>>> What's the behavior if there are multiple table sources, among >> >>> which >> >>>>>>> some do not support `SupportsWatermarkPushDown`? >> >>>>>> >> >>>>>> @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember, >> >>>> perhaps >> >>>>>> the naming issue can be put to the end of the discussion, because >> >>> more >> >>>>>> people like @Martijn @Shuo are considering whether these >> >>> configurations >> >>>>>> should be put into the DDL or the 'OPTIONS' hint. Here's what I >> >>>>>> think, Putting these configs into DDL or putting them into >> 'OPTIONS' >> >>>> hint >> >>>>>> is actually the same thing, because the 'OPTIONS' hint is mainly >> >>> used >> >>>> to >> >>>>>> configure the properties of conenctor. The reason why I want to use >> >>> a >> >>>> new >> >>>>>> hint is to make sure the semantics clear, in my opinion the >> >>>> configuration >> >>>>>> of watermark should not be mixed up with connector. However, a new >> >>> hint >> >>>>>> does make it more difficult to use to some extent, for example, >> >>> when a >> >>>>> user >> >>>>>> uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this >> >>> point, >> >>>>> maby >> >>>>>> it is more appropriate to use uniform 'OPTIONS' hint. >> >>>>>> On the other hand, if we enrich more watermark option keys in >> >>> 'OPTIONS' >> >>>>>> hints, The question will be what we treat the definatrions of >> >>>> 'OPTIONS' >> >>>>>> hint, is this only specific to the connector options or could be >> >>> more? >> >>>>>> Maybe @Jark could share more insights here. In my opion, 'OPTIONS' >> >>> is >> >>>>> only >> >>>>>> related to the connector options, which is not like the gernal >> >>>> watermark >> >>>>>> options. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Shuo Cheng <njucs...@gmail.com> 于2023年2月22日周三 19:17写道: >> >>>>>> >> >>>>>>> Hi Kui, >> >>>>>>> >> >>>>>>> Thanks for driving the discussion. It's quite useful to introduce >> >>>>> Watermark >> >>>>>>> options. I have some questions: >> >>>>>>> >> >>>>>>> What kind of hints is "WATERMARK_PARAMS"? >> >>>>>>> Currently, we have two kinds of hints in Flink: Dynamic Table >> >>>> Options & >> >>>>>>> Query Hints. As described in the Flip, "WATERMARK_PARAMS" is more >> >>>> like >> >>>>>>> Dynamic Table Options. So two questions arise here: >> >>>>>>> >> >>>>>>> 1) Are these watermark options to be exposed as connector WITH >> >>>>> options? Aa >> >>>>>>> described in SQL Hints doc[1], "Dynamic Table Options allow to >> >>>>> specify or >> >>>>>>> override table options dynamically", which implies that these >> >>> options >> >>>>> can >> >>>>>>> also be configured in WITH options. >> >>>>>>> >> >>>>>>> 2) Do we really need a new hint name like 'WATERMARK_PARAMS', >> >>> table >> >>>>>>> options use "OPTIONS" as hint name, like '/*+ >> >>>>>>> OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can enrich >> >>>> more >> >>>>>>> table option keys for watermark, e.g., /*+ >> >>>>>>> OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. >> >>>>>>> >> >>>>>>> >> >>>>>>> [1] >> >>>>>>> >> >>>>>>> >> >>>>> >> >>>> >> >>> >> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/ >> >>>>>>> >> >>>>>>> On Wed, Feb 22, 2023 at 10:22 AM kui yuan <catye...@gmail.com> >> >>>> wrote: >> >>>>>>> >> >>>>>>>> Hi devs, >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> I'd like to start a discussion thread for FLIP-296[1]. This >> >>> comes >> >>>>> from an >> >>>>>>>> offline discussion with @Yun Tang, and we hope to enrich table >> >>> API >> >>>> & >> >>>>> SQL >> >>>>>>> to >> >>>>>>>> support many watermark-related features which were only >> >>> implemented >> >>>>> at >> >>>>>>> the >> >>>>>>>> datastream API level. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Basically, we want to introduce watermark options in table API & >> >>>> SQL >> >>>>> via >> >>>>>>>> SQL hint named 'WATERMARK_PARAMS' to support features: >> >>>>>>>> >> >>>>>>>> 1、Configurable watermark emit strategy >> >>>>>>>> >> >>>>>>>> 2、Dealing with idle sources >> >>>>>>>> >> >>>>>>>> 3、Watermark alignment >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Last but not least, thanks to Qingsheng and Jing Zhang for the >> >>>>> initial >> >>>>>>>> reviews. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Looking forward to your thoughts and any feedback is >> >>> appreciated! >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> [1] >> >>>>>>>> >> >>>>>>> >> >>>>> >> >>>> >> >>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Best >> >>>>>>>> >> >>>>>>>> Yuan Kui >> >>>>>>>> >> >>>>>>> >> >>>>> >> >>>> >> >>> >> >> >> > >> >>