Sorry, I forgot to remind you that Timo's concern about the changes to the CompiledPlan looks like is still not covered in the FLIP.
Best, Jark On Wed, 1 Mar 2023 at 19:28, Jark Wu <imj...@gmail.com> wrote: > Hi Kui, > > Thank you for the great proposal, I think this is already in a good shape. > > Just a kind reminder, according to the community guidelines[1], > if there are unresponsive reviewers, a typical reasonable time > to wait for responses is one week, but be pragmatic about it. > > Regarding the FLIP, I have some comments below: > > 1. IIRC, this is the first time we introduce the framework-level connector > options that the option is not recognized and handled by connectors. > The FLIP should cover how framework filters the watermark related options > to avoid discover connector factory failed, and what happens if the > connector > already supported the conflict options. > > 2. I'm not sure about the usage scenarios of event gap emit strategy. Do > you have any specific use case of this strategy? I'm confused why no one > requested this strategy before no matter in DataStream or SQL, but maybe > I missed something. I'm not against to add this option, but just want to > be > careful when adding new API because it's hard to remove in the future. > > > 3. Adding a "Public Interface"[2] section to summarize the > proposed APIs and options would be better for developers to > know the impact. Currently, the APIs are scattered in the long > design sections. > > Best, > Jark > > > [1]: > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > [2]: https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > > On Wed, 1 Mar 2023 at 16:56, Kui Yuan <catye...@gmail.com> wrote: > >> Hi all, >> >> Thanks for all discussions! >> >> Anyone else have questions or suggestions? if not, I will start a vote >> thread later. >> >> Best >> Kui Yuan >> >> kui yuan <catye...@gmail.com> 于2023年2月27日周一 20:21写道: >> >> > Hi Timo, >> > >> > Thanks for your advice. I totally agree with your suggestion of naming >> > convention, I will rename these options and update the flip later, >> thanks >> > very much. >> > >> > In our internal implementation we had put these options inside the >> > `FactoryUtil`, just as you expect. We have also taken into account the >> > changes to the CompiledPlan and we have packaged these options >> > appropriately to minimize intrusiveness and ensure the compatibility to >> the >> > `WatermarkPushDownSpec`. >> > >> > > A hint to the implementation: I would suggest that we add those >> options >> > > to `FactoryUtil`. All cross-connector options should end up there. >> > >> > >> > > Please also consider the changes to the CompiledPlan in your FLIP. >> This >> > > change has implications on the JSON format as watermark strategy of >> > > ExecNode becomes more complex, see WatermarkPushDownSpec >> > >> > Best >> > Kui Yuan >> > >> > Timo Walther <twal...@apache.org> 于2023年2月27日周一 18:05写道: >> > >> >> Hi Kui Yuan, >> >> >> >> thanks for working on this FLIP. Let me also give some comments about >> >> the proposed changes. >> >> >> >> I support the direction of this FLIP about handling these >> >> watermark-specific properties through options and /*+OPTIONS(...) */ >> >> hints. >> >> >> >> Regarding naming, I would like to keep the options in sync with >> existing >> >> options: >> >> >> >> > 'watermark.emit.strategy'='ON_EVENT' >> >> >> >> Let's use lower case (e.g. `on-event`) that matches with properties >> like >> >> sink.partitioner [1] or sink.delivery-guarantee [2]. >> >> >> >> > 'source.idle-timeout'='1min' >> >> >> >> According to FLIP-122 [3], we want to prefix all scan-source related >> >> properties with `scan.*`. This clearly includes idle-timeout and >> >> actually also watermark strategies which don't apply for lookup >> sources. >> >> >> >> Summarizing the comments above, we should use the following options: >> >> >> >> 'scan.watermark.emit.strategy'='on-event', >> >> 'scan.watermark.emit.on-event.gap'='10000', >> >> 'scan.idle-timeout'='1min', >> >> 'scan.watermark.alignment.group'='alignment-group-1', >> >> 'scan.watermark.alignment.max-drift'='1min', >> >> 'scan.watermark.alignment.update-interval'='1s' >> >> >> >> I know that this makes the keys even longer, but given that those >> >> options are for power users this should be acceptable. It also clearly >> >> indicates which options are for sinks, scans, and lookups. This >> >> potentially also helps in allow lists. >> >> >> >> A hint to the implementation: I would suggest that we add those options >> >> to `FactoryUtil`. All cross-connector options should end up there. >> >> >> >> Please also consider the changes to the CompiledPlan in your FLIP. This >> >> change has implications on the JSON format as watermark strategy of >> >> ExecNode becomes more complex, see WatermarkPushDownSpec [4]. >> >> >> >> Regards, >> >> Timo >> >> >> >> >> >> [1] >> >> >> >> >> https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner >> >> [2] >> >> >> >> >> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee >> >> [3] >> >> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory >> >> [4] >> >> >> >> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java >> >> >> >> >> >> On 24.02.23 04:55, kui yuan wrote: >> >> > Hi all, >> >> > >> >> > I have updated the flip according to the discussion, and we will >> extend >> >> the >> >> > watermark-related features with both table options and 'OPTIONS' >> hint, >> >> like >> >> > this: >> >> > >> >> > ``` >> >> > -- configure in table options >> >> > CREATE TABLE user_actions ( >> >> > ... >> >> > user_action_time TIMESTAMP(3), >> >> > WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5' >> >> SECOND >> >> > ) WITH ( >> >> > 'watermark.emit.strategy'='ON_PERIODIC', >> >> > ... >> >> > ); >> >> > >> >> > -- use 'OPTIONS' hint >> >> > select ... from source_table /*+ OPTIONS('watermark.emit.strategy'= >> >> > 'ON_PERIODIC') */ >> >> > ``` >> >> > >> >> > Does everybody have any other questions? >> >> > >> >> > Best >> >> > Kui Yuan >> >> > >> >> > kui yuan <catye...@gmail.com> 于2023年2月23日周四 20:05写道: >> >> > >> >> >> Hi all, >> >> >> >> >> >> Thanks for all suggestions. >> >> >> >> >> >> We will extend the watermark-related features in SQL layer with >> dynamic >> >> >> table options and 'OPTIONS' hint, just as everyone expects. I will >> >> modify >> >> >> Flip-296 as discussed. >> >> >> >> >> >> @Martijn As far as I know, there is no hint interface in the table >> API, >> >> >> so we can't use hint in table API directly. if we need to extend the >> >> hint >> >> >> interface in the table API, maybe we need another flip. However, if >> we >> >> >> extend the watermark-related features in the dynamic table options, >> >> maybe >> >> >> we are able to use them indirectly in the table API like this[1]: >> >> >> >> >> >> ``` >> >> >> // register a table named "Orders" >> >> >> tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product >> >> STRING, >> >> >> amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)"); >> >> >> ``` >> >> >> >> >> >> [1] >> >> >> >> >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/ >> >> >> >> >> >> Best >> >> >> Kui Yuan >> >> >> >> >> >> Yun Tang <myas...@live.com> 于2023年2月23日周四 17:46写道: >> >> >> >> >> >>> Thanks for the warm discussions! >> >> >>> >> >> >>> I had an offline discussion with Kui about the replies. I think I >> >> could >> >> >>> give some explanations on the original intention to introduce >> another >> >> >>> WATERMARK_PARAMS. If we take a look at the current datastream API, >> the >> >> >>> watermark strategy does not belong to any specific connector. And >> we >> >> >>> thought the dynamic table options were more like the configurations >> >> within >> >> >>> some specific connector. >> >> >>> >> >> >>> From the review comments, I think most people feel good to make it >> >> part >> >> >>> of the dynamic table options. I think this is fine if we give more >> >> clear >> >> >>> scope definition of the dynamic table options here. And I also >> agree >> >> with >> >> >>> Jingsong's concern about adding SQL syntax which is the most >> >> concerning >> >> >>> part before launching this discussion. >> >> >>> >> >> >>> For Martijn's concern, if we accept to make the watermark-related >> >> options >> >> >>> part of dynamic table options, the problem becomes another topic: >> how >> >> to >> >> >>> support the dynamic table options in table API, which is deserved >> to >> >> create >> >> >>> another FLIP. >> >> >>> >> >> >>> Best >> >> >>> Yun Tang >> >> >>> ________________________________ >> >> >>> From: Martijn Visser <martijnvis...@apache.org> >> >> >>> Sent: Thursday, February 23, 2023 17:14 >> >> >>> To: dev@flink.apache.org <dev@flink.apache.org> >> >> >>> Subject: Re: [DISCUSS] FLIP-296: Watermark options for table API & >> SQL >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> While I can understand that there's a desire to first focus on >> solving >> >> >>> this >> >> >>> problem for SQL, I do wonder if we should ignore the Table API at >> this >> >> >>> point. If we could include the syntax for the Table API, it >> >> potentially >> >> >>> could also be implemented by another contributor without needing to >> >> create >> >> >>> another FLIP. If we don't design it right now, my concern is that >> this >> >> >>> will >> >> >>> increase sparsity for the Table API which ultimately hurts >> adoption. >> >> >>> >> >> >>> With regards to the syntax, I have a preference to solve this via >> the >> >> >>> connector options (e.g. like you can currently specify things as >> >> >>> scan.startup.specific-offsets or scan.bounded.mode for the Kafka >> >> >>> connector). You could still use the dynamic table options to >> >> override/add >> >> >>> them. >> >> >>> >> >> >>> Best regards, >> >> >>> >> >> >>> Martijn >> >> >>> >> >> >>> On Thu, Feb 23, 2023 at 7:21 AM Shammon FY <zjur...@gmail.com> >> wrote: >> >> >>> >> >> >>>> Hi kui >> >> >>>> >> >> >>>> Thanks for your answer and +1 to yuxia too >> >> >>>> >> >> >>>>> we should not bind the watermark-related options to a connector >> to >> >> >>> ensure >> >> >>>> semantic clarity. >> >> >>>> >> >> >>>> In my opinion, adding watermark-related options to a connector is >> >> much >> >> >>> more >> >> >>>> clear. Currently users can define simple watermark strategy in >> DDL, >> >> >>> adding >> >> >>>> more configuration items in connector options is easy to >> understand >> >> >>>> >> >> >>>> Best, >> >> >>>> Shammon >> >> >>>> >> >> >>>> >> >> >>>> On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li < >> jingsongl...@gmail.com >> >> > >> >> >>>> wrote: >> >> >>>> >> >> >>>>> Thanks for your proposal. >> >> >>>>> >> >> >>>>> +1 to yuxia, consider watermark-related hints as option hints. >> >> >>>>> >> >> >>>>> Personally, I am cautious about adding SQL syntax, >> WATERMARK_PARAMS >> >> is >> >> >>>>> also SQL syntax to some extent. >> >> >>>>> >> >> >>>>> We can use OPTIONS to meet this requirement if possible. >> >> >>>>> >> >> >>>>> Best, >> >> >>>>> Jingsong >> >> >>>>> >> >> >>>>> On Thu, Feb 23, 2023 at 10:41 AM yuxia < >> luoyu...@alumni.sjtu.edu.cn >> >> > >> >> >>>>> wrote: >> >> >>>>>> >> >> >>>>>> Hi, Yuan Kui. >> >> >>>>>> Thanks for driving it. >> >> >>>>>> IMO, the 'OPTIONS' hint may be not only specific to the >> connector >> >> >>>>> options. Just as a reference, we also have `sink.parallelism`[1] >> as >> >> a >> >> >>>>> connector options. It enables >> >> >>>>>> user to specific the writer's parallelism dynamically per-query. >> >> >>>>>> >> >> >>>>>> Personally, I perfer to consider watermark-related hints as >> option >> >> >>>>> hints. So, user can define a default watermark strategy for the >> >> table, >> >> >>>> and >> >> >>>>> if user dosen't needed to changes it, they need to do nothing in >> >> their >> >> >>>>> query instead of specific it ervery time. >> >> >>>>>> >> >> >>>>>> [1] >> >> >>>>> >> >> >>>> >> >> >>> >> >> >> https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism >> >> >>>>>> >> >> >>>>>> Best regards, >> >> >>>>>> Yuxia >> >> >>>>>> >> >> >>>>>> Best regards, >> >> >>>>>> Yuxia >> >> >>>>>> >> >> >>>>>> ----- 原始邮件 ----- >> >> >>>>>> 发件人: "kui yuan" <catye...@gmail.com> >> >> >>>>>> 收件人: "dev" <dev@flink.apache.org> >> >> >>>>>> 抄送: "Jark Wu" <imj...@gmail.com> >> >> >>>>>> 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 >> >> >>>>>> 主题: Re: [DISCUSS] FLIP-296: Watermark options for table API & >> SQL >> >> >>>>>> >> >> >>>>>> Hi all, >> >> >>>>>> >> >> >>>>>> Thanks for the lively discussion and I will respond to these >> >> >>> questions >> >> >>>>> one >> >> >>>>>> by one. However, there are also some common questions and I will >> >> >>> answer >> >> >>>>>> together. >> >> >>>>>> >> >> >>>>>> @郑 Thanks for your reply. The features mentioned in this flip >> are >> >> >>> only >> >> >>>>> for >> >> >>>>>> those source connectors that implement the >> >> SupportsWatermarkPushDown >> >> >>>>>> interface, generating watermarks in other graph locations is >> not in >> >> >>> the >> >> >>>>>> scope of this discussion. Perhaps another flip can be proposed >> >> >>> later to >> >> >>>>>> implement this feature. >> >> >>>>>> >> >> >>>>>> @Shammon Thanks for your reply. In Flip-296, a rejected >> alternative >> >> >>> is >> >> >>>>>> adding watermark related options in the connector options,we >> >> believe >> >> >>>> that >> >> >>>>>> we should not bind the watermark-related options to a connector >> to >> >> >>>> ensure >> >> >>>>>> semantic clarity. >> >> >>>>>> >> >> >>>>>>> What will happen if we add watermark related options in `the >> >> >>>> connector >> >> >>>>>>> options`? Will the connector ignore these options or throw an >> >> >>>>> exception? >> >> >>>>>>> How can we support this? >> >> >>>>>> >> >> >>>>>> If user defines different watermark configurations for one >> table in >> >> >>> two >> >> >>>>>> places, I tend to prefer the first place would prevail, but we >> can >> >> >>>> also >> >> >>>>>> throw exception or just print logs to prompt the user, which are >> >> >>>>>> implementation details. >> >> >>>>>> >> >> >>>>>>> If one table is used by two operators with different watermark >> >> >>>> params, >> >> >>>>>>> what will happen? >> >> >>>>>> >> >> >>>>>> @Martijn Thanks for your reply. I'm sorry that we are not >> >> >>> particularly >> >> >>>>>> accurate, this hint is mainly for SQL, not table API. >> >> >>>>>> >> >> >>>>>>> While the FLIP talks about watermark options for Table API & >> SQL, >> >> >>> I >> >> >>>>> only >> >> >>>>>>> see proposed syntax for SQL, not for the Table API. What is >> your >> >> >>>>> proposal >> >> >>>>>>> for the Table API >> >> >>>>>> >> >> >>>>>> @Jane Thanks for your reply. For the first question, If the user >> >> >>> uses >> >> >>>>> this >> >> >>>>>> hint on those sourse that does not implement the >> >> >>>>> SupportsWatermarkPushDown >> >> >>>>>> interface, it will be completely invalid. The task will run as >> >> >>> normal >> >> >>>> as >> >> >>>>> if >> >> >>>>>> the hint had not been used. >> >> >>>>>> >> >> >>>>>>> What's the behavior if there are multiple table sources, among >> >> >>> which >> >> >>>>>>> some do not support `SupportsWatermarkPushDown`? >> >> >>>>>> >> >> >>>>>> @Jane feedback that 'WATERMARK_PARAMS' is difficult to remember, >> >> >>>> perhaps >> >> >>>>>> the naming issue can be put to the end of the discussion, >> because >> >> >>> more >> >> >>>>>> people like @Martijn @Shuo are considering whether these >> >> >>> configurations >> >> >>>>>> should be put into the DDL or the 'OPTIONS' hint. Here's what I >> >> >>>>>> think, Putting these configs into DDL or putting them into >> >> 'OPTIONS' >> >> >>>> hint >> >> >>>>>> is actually the same thing, because the 'OPTIONS' hint is mainly >> >> >>> used >> >> >>>> to >> >> >>>>>> configure the properties of conenctor. The reason why I want to >> use >> >> >>> a >> >> >>>> new >> >> >>>>>> hint is to make sure the semantics clear, in my opinion the >> >> >>>> configuration >> >> >>>>>> of watermark should not be mixed up with connector. However, a >> new >> >> >>> hint >> >> >>>>>> does make it more difficult to use to some extent, for example, >> >> >>> when a >> >> >>>>> user >> >> >>>>>> uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this >> >> >>> point, >> >> >>>>> maby >> >> >>>>>> it is more appropriate to use uniform 'OPTIONS' hint. >> >> >>>>>> On the other hand, if we enrich more watermark option keys in >> >> >>> 'OPTIONS' >> >> >>>>>> hints, The question will be what we treat the definatrions of >> >> >>>> 'OPTIONS' >> >> >>>>>> hint, is this only specific to the connector options or could be >> >> >>> more? >> >> >>>>>> Maybe @Jark could share more insights here. In my opion, >> 'OPTIONS' >> >> >>> is >> >> >>>>> only >> >> >>>>>> related to the connector options, which is not like the gernal >> >> >>>> watermark >> >> >>>>>> options. >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> Shuo Cheng <njucs...@gmail.com> 于2023年2月22日周三 19:17写道: >> >> >>>>>> >> >> >>>>>>> Hi Kui, >> >> >>>>>>> >> >> >>>>>>> Thanks for driving the discussion. It's quite useful to >> introduce >> >> >>>>> Watermark >> >> >>>>>>> options. I have some questions: >> >> >>>>>>> >> >> >>>>>>> What kind of hints is "WATERMARK_PARAMS"? >> >> >>>>>>> Currently, we have two kinds of hints in Flink: Dynamic Table >> >> >>>> Options & >> >> >>>>>>> Query Hints. As described in the Flip, "WATERMARK_PARAMS" is >> more >> >> >>>> like >> >> >>>>>>> Dynamic Table Options. So two questions arise here: >> >> >>>>>>> >> >> >>>>>>> 1) Are these watermark options to be exposed as connector WITH >> >> >>>>> options? Aa >> >> >>>>>>> described in SQL Hints doc[1], "Dynamic Table Options allow to >> >> >>>>> specify or >> >> >>>>>>> override table options dynamically", which implies that these >> >> >>> options >> >> >>>>> can >> >> >>>>>>> also be configured in WITH options. >> >> >>>>>>> >> >> >>>>>>> 2) Do we really need a new hint name like 'WATERMARK_PARAMS', >> >> >>> table >> >> >>>>>>> options use "OPTIONS" as hint name, like '/*+ >> >> >>>>>>> OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can >> enrich >> >> >>>> more >> >> >>>>>>> table option keys for watermark, e.g., /*+ >> >> >>>>>>> OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> [1] >> >> >>>>>>> >> >> >>>>>>> >> >> >>>>> >> >> >>>> >> >> >>> >> >> >> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/ >> >> >>>>>>> >> >> >>>>>>> On Wed, Feb 22, 2023 at 10:22 AM kui yuan <catye...@gmail.com> >> >> >>>> wrote: >> >> >>>>>>> >> >> >>>>>>>> Hi devs, >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> I'd like to start a discussion thread for FLIP-296[1]. This >> >> >>> comes >> >> >>>>> from an >> >> >>>>>>>> offline discussion with @Yun Tang, and we hope to enrich table >> >> >>> API >> >> >>>> & >> >> >>>>> SQL >> >> >>>>>>> to >> >> >>>>>>>> support many watermark-related features which were only >> >> >>> implemented >> >> >>>>> at >> >> >>>>>>> the >> >> >>>>>>>> datastream API level. >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> Basically, we want to introduce watermark options in table >> API & >> >> >>>> SQL >> >> >>>>> via >> >> >>>>>>>> SQL hint named 'WATERMARK_PARAMS' to support features: >> >> >>>>>>>> >> >> >>>>>>>> 1、Configurable watermark emit strategy >> >> >>>>>>>> >> >> >>>>>>>> 2、Dealing with idle sources >> >> >>>>>>>> >> >> >>>>>>>> 3、Watermark alignment >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> Last but not least, thanks to Qingsheng and Jing Zhang for the >> >> >>>>> initial >> >> >>>>>>>> reviews. >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> Looking forward to your thoughts and any feedback is >> >> >>> appreciated! >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> [1] >> >> >>>>>>>> >> >> >>>>>>> >> >> >>>>> >> >> >>>> >> >> >>> >> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 >> >> >>>>>>>> >> >> >>>>>>>> >> >> >>>>>>>> Best >> >> >>>>>>>> >> >> >>>>>>>> Yuan Kui >> >> >>>>>>>> >> >> >>>>>>> >> >> >>>>> >> >> >>>> >> >> >>> >> >> >> >> >> > >> >> >> >> >> >