Hi Jing, Thanks for your advice. In upcoming PR, I will explain all these things (Including flip-182,flip-217,etc.) clearly in the flink doc to make sure the user can understand the behavior behind it.
Best, Kui Jing Ge <j...@ververica.com.invalid> 于2023年3月7日周二 19:42写道: > Hi Kui, > > Thanks for adding it into the Flip. There is one more thing wrt this topic > you might want to pay attention to, a little bit off-topic, is that Flink > SQL users might not be familiar with use cases of low level Datastream API. > IMHO, it is highly recommended(mandatory) to write the dependency you just > described in your last email and all related information in Flink doc > during developing this FLIP within the upcoming PR. Without those > guidelines in doc, users might feel confused after using those OPTIONs, > since they might not be aware of what happens underneath, and therefore > don't know why it does not work even if they did everything right at Flink > SQL level. > > Best regards, > Jing > > > On Tue, Mar 7, 2023 at 6:36 AM Kui Yuan <catye...@gmail.com> wrote: > > > Hi Jing, > > > > > > Thanks for the reminder. The aim of this flip is letting the sql users to > > use those features in the Datastream API, we don't intend to extend > > flip-217. In my opinion, the watermark alignment with only one source can > > be configured by the options given in flip, and if the source connector > > does not implement flip-217, the task will run with an error, reminding > the > > user to use `pipeline.watermark-alignment.allow- > unaligned-source-splits`, > > but maybe these behaviors are not understood by everyone, I will add some > > tips about flip-217 in the flip to let users understand the behavior in > the > > case of source splits. > > > > > > Best, > > > > Kui Yuan > > > > Jing Ge <j...@ververica.com.invalid> 于2023年3月7日周二 04:23写道: > > > > > Hi Kui, > > > > > > Thanks for pointing that out. I knew FLIP-217 which was done by an > > > engineer working in my team. As far as I am concerned, your FLIP > should > > > answer the following questions: > > > > > > 1. How to enable the watermark alignment of a source splits with Flink > > SQL? > > > e.g. which options should be used if only one source is used? > > > > > > 2. Default behaviour. i.e. Flink SQL users should be aware that > watermark > > > alignment of source split will only work for sources that implement > > > FLIP-217 properly. Should users take care of > > > `pipeline.watermark-alignment.allow-unaligned-source-splits` > > > while using Flink SQL? > > > > > > Best regards, > > > Jing > > > > > > > > > On Fri, Mar 3, 2023 at 8:46 AM Kui Yuan <catye...@gmail.com> wrote: > > > > > > > Hi all, > > > > > > > > Thanks for all. There are more questions and I will answer one by > one. > > > > > > > > @Jark Thanks for your tips. For the first question, I will add more > > > details > > > > in the flip, and give a POC[1] so that pepole can know how I'm > > currently > > > > implementing these features. > > > > > > > > > IIRC, this is the first time we introduce the framework-level > > connector > > > > > options that the option is not recognized and handled by > connectors. > > > > > The FLIP should cover how framework filters the watermark related > > > options > > > > > to avoid discover connector factory failed, and what happens if the > > > > > connector already supported the conflict options > > > > > > > > For the second question, We know that the default strategy is > > > 'on-periodic' > > > > in SQL layer, and the default interval is 200ms. The reason for > emiting > > > > watermark periodically is that the time advancement of consecutive > > events > > > > may be very small, we don't need to calculate watermark for each > event. > > > > Same for 'on-event' strategy, so my idea is that we can set a fixed > gap > > > for > > > > 'on-event' strategy. > > > > > > > > > I'm not sure about the usage scenarios of event gap emit strategy. > Do > > > > > you have any specific use case of this strategy? I'm confused why > no > > > one > > > > > requested this strategy before no matter in DataStream or SQL, but > > > maybe > > > > > I missed something. I'm not against to add this option, but just > want > > > to > > > > be > > > > > careful when adding new API because it's hard to remove in the > > future. > > > > > > > > As @Timo said, There is no default features like 'on-event-gap' in > > > > DataStream API, but the users can achieve the 'on-event-gap' feature > by > > > > using `WatermarkGenerator` interface, just like the implemention in > my > > > > POC[1]. However, If we don't provide it in SQL layer, there is no > way > > > for > > > > users to use similar features. > > > > > > > > > Jark raised a very good point. I thought we only expose what is > > > > > contained in DataStream API already. If this strategy is not part > of > > > > > DataStream API, would like to exclude it from the FLIP. We need to > be > > > > > careful which strategies we offer by default. > > > > > > > > @Jark @Timo I'm sorry, perhaps I don't understand what are your > > concerns > > > > about CompiledPlan, maybe I missed something else, maybe you can look > > at > > > my > > > > POC first to see if there is somewhere to worry about. > > > > > > > > > Sorry, I forgot to remind you that Timo's concern about the changes > > to > > > > the > > > > > CompiledPlan looks like is still not covered in the FLIP. > > > > > > > > @Jing We could have more discussion about naming, but I prefer that > the > > > > naming should be consistent with the DataStream API. > > > > About aligning splits/partitions/shards, maybe you missed FLIP-217[2] > > > which > > > > aims to support watermark alignment of source splits. > > > > > > > > > After reading the most up-to-date Flip, I didn't find any > information > > > if > > > > > this solution will support aligning splits/partitions/shards [1]. > > Did I > > > > > miss anything? > > > > > > > > Best > > > > Kui Yuan > > > > > > > > [1] the POC: > > > > https://github.com/yuchengxin/flink/tree/yuankui/watermark_params > > > > [2] FLIP-217: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits > > > > > > > > > > > > Jing Ge <j...@ververica.com.invalid> 于2023年3月3日周五 08:03写道: > > > > > > > > > Hi, > > > > > > > > > > Thanks Kui for driving this Flip and thanks all for the informative > > > > > discussion. > > > > > > > > > > @Timo > > > > > > > > > > Your suggestion about the naming convention is excellent. Thanks! I > > was > > > > > wondering why you, exceptionally, suggested 'scan.idle-timeout' > > instead > > > > of > > > > > 'scan.watermark.idle-timeout'. I must miss something here. > > > > > > > > > > There is one more NIT. I am just aware that "drift" is used for the > > > > > watermark alignment. It seems to be fine while using DataStream > API, > > > > > because we will not really see it. But with the OPTIONS in SQL, a > > much > > > > > bigger group of users (including SRE, tech support, etc) will see > the > > > > word > > > > > "drift". Given that "drift" wasn't used widely yet and with all > > > training > > > > > materials, Flink doc [1][2][3] (search with "lag"), "lag" has been > > used > > > > to > > > > > describe timestamp difference between watermark and its > > > > > corresponding event. Do we really need to introduce another term > for > > > the > > > > > same thing? How about using > 'scan.watermark.alignment.max-lag'='1min' > > > and > > > > > change the parameter name from maxAllowedWatermarkDrift to > > > > > maxAllowedWatermarkLag [4] because of naming consistency? Just my > two > > > > cents > > > > > worth. > > > > > > > > > > @Kui > > > > > > > > > > After reading the most up-to-date Flip, I didn't find any > information > > > if > > > > > this solution will support aligning splits/partitions/shards [1]. > > Did I > > > > > miss anything? > > > > > > > > > > +1 for the concern about Table API. We'd be better keep Table API > and > > > SQL > > > > > synced for new features. > > > > > > > > > > Best regards, > > > > > Jing > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_ > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/event-time/built_in/#fixed-amount-of-lateness > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/kafka/ > > > > > [4] > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/blob/4aacff572a9e3996c5dee9273638831e4040c767/flink-core/src/main/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.java#L169 > > > > > > > > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 3:54 PM Timo Walther <twal...@apache.org> > > > wrote: > > > > > > > > > > > Reg. 2: > > > > > > > event gap emit strategy [...] no matter in DataStream or SQL > > > > > > > > > > > > Jark raised a very good point. I thought we only expose what is > > > > > > contained in DataStream API already. If this strategy is not part > > of > > > > > > DataStream API, would like to exclude it from the FLIP. We need > to > > be > > > > > > careful which strategies we offer by default. > > > > > > > > > > > > Reg 1: > > > > > > This already has a JIRA ticket with additional thoughts on this > > > topic: > > > > > > https://issues.apache.org/jira/browse/FLINK-25221 > > > > > > > > > > > > Regards, > > > > > > Timo > > > > > > > > > > > > > > > > > > > > > > > > On 01.03.23 12:31, Jark Wu wrote: > > > > > > > Sorry, I forgot to remind you that Timo's concern about the > > changes > > > > to > > > > > > the > > > > > > > CompiledPlan looks like is still not covered in the FLIP. > > > > > > > > > > > > > > Best, > > > > > > > Jark > > > > > > > > > > > > > > On Wed, 1 Mar 2023 at 19:28, Jark Wu <imj...@gmail.com> wrote: > > > > > > > > > > > > > >> Hi Kui, > > > > > > >> > > > > > > >> Thank you for the great proposal, I think this is already in a > > > good > > > > > > shape. > > > > > > >> > > > > > > >> Just a kind reminder, according to the community > guidelines[1], > > > > > > >> if there are unresponsive reviewers, a typical reasonable time > > > > > > >> to wait for responses is one week, but be pragmatic about it. > > > > > > >> > > > > > > >> Regarding the FLIP, I have some comments below: > > > > > > >> > > > > > > >> 1. IIRC, this is the first time we introduce the > framework-level > > > > > > connector > > > > > > >> options that the option is not recognized and handled by > > > connectors. > > > > > > >> The FLIP should cover how framework filters the watermark > > related > > > > > > options > > > > > > >> to avoid discover connector factory failed, and what happens > if > > > the > > > > > > >> connector > > > > > > >> already supported the conflict options. > > > > > > >> > > > > > > >> 2. I'm not sure about the usage scenarios of event gap emit > > > > strategy. > > > > > Do > > > > > > >> you have any specific use case of this strategy? I'm confused > > why > > > no > > > > > one > > > > > > >> requested this strategy before no matter in DataStream or SQL, > > but > > > > > maybe > > > > > > >> I missed something. I'm not against to add this option, but > just > > > > want > > > > > to > > > > > > >> be > > > > > > >> careful when adding new API because it's hard to remove in the > > > > future. > > > > > > >> > > > > > > >> > > > > > > >> 3. Adding a "Public Interface"[2] section to summarize the > > > > > > >> proposed APIs and options would be better for developers to > > > > > > >> know the impact. Currently, the APIs are scattered in the long > > > > > > >> design sections. > > > > > > >> > > > > > > >> Best, > > > > > > >> Jark > > > > > > >> > > > > > > >> > > > > > > >> [1]: > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > > > >> [2]: > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > > > > > > >> > > > > > > >> On Wed, 1 Mar 2023 at 16:56, Kui Yuan <catye...@gmail.com> > > wrote: > > > > > > >> > > > > > > >>> Hi all, > > > > > > >>> > > > > > > >>> Thanks for all discussions! > > > > > > >>> > > > > > > >>> Anyone else have questions or suggestions? if not, I will > > start a > > > > > vote > > > > > > >>> thread later. > > > > > > >>> > > > > > > >>> Best > > > > > > >>> Kui Yuan > > > > > > >>> > > > > > > >>> kui yuan <catye...@gmail.com> 于2023年2月27日周一 20:21写道: > > > > > > >>> > > > > > > >>>> Hi Timo, > > > > > > >>>> > > > > > > >>>> Thanks for your advice. I totally agree with your suggestion > > of > > > > > naming > > > > > > >>>> convention, I will rename these options and update the flip > > > later, > > > > > > >>> thanks > > > > > > >>>> very much. > > > > > > >>>> > > > > > > >>>> In our internal implementation we had put these options > inside > > > the > > > > > > >>>> `FactoryUtil`, just as you expect. We have also taken into > > > > account > > > > > > the > > > > > > >>>> changes to the CompiledPlan and we have packaged these > options > > > > > > >>>> appropriately to minimize intrusiveness and ensure the > > > > compatibility > > > > > > to > > > > > > >>> the > > > > > > >>>> `WatermarkPushDownSpec`. > > > > > > >>>> > > > > > > >>>>> A hint to the implementation: I would suggest that we add > > those > > > > > > >>> options > > > > > > >>>>> to `FactoryUtil`. All cross-connector options should end up > > > > there. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>>> Please also consider the changes to the CompiledPlan in > your > > > > FLIP. > > > > > > >>> This > > > > > > >>>>> change has implications on the JSON format as watermark > > > strategy > > > > of > > > > > > >>>>> ExecNode becomes more complex, see WatermarkPushDownSpec > > > > > > >>>> > > > > > > >>>> Best > > > > > > >>>> Kui Yuan > > > > > > >>>> > > > > > > >>>> Timo Walther <twal...@apache.org> 于2023年2月27日周一 18:05写道: > > > > > > >>>> > > > > > > >>>>> Hi Kui Yuan, > > > > > > >>>>> > > > > > > >>>>> thanks for working on this FLIP. Let me also give some > > comments > > > > > about > > > > > > >>>>> the proposed changes. > > > > > > >>>>> > > > > > > >>>>> I support the direction of this FLIP about handling these > > > > > > >>>>> watermark-specific properties through options and > > > /*+OPTIONS(...) > > > > > */ > > > > > > >>>>> hints. > > > > > > >>>>> > > > > > > >>>>> Regarding naming, I would like to keep the options in sync > > with > > > > > > >>> existing > > > > > > >>>>> options: > > > > > > >>>>> > > > > > > >>>>> > 'watermark.emit.strategy'='ON_EVENT' > > > > > > >>>>> > > > > > > >>>>> Let's use lower case (e.g. `on-event`) that matches with > > > > properties > > > > > > >>> like > > > > > > >>>>> sink.partitioner [1] or sink.delivery-guarantee [2]. > > > > > > >>>>> > > > > > > >>>>> > 'source.idle-timeout'='1min' > > > > > > >>>>> > > > > > > >>>>> According to FLIP-122 [3], we want to prefix all > scan-source > > > > > related > > > > > > >>>>> properties with `scan.*`. This clearly includes > idle-timeout > > > and > > > > > > >>>>> actually also watermark strategies which don't apply for > > lookup > > > > > > >>> sources. > > > > > > >>>>> > > > > > > >>>>> Summarizing the comments above, we should use the following > > > > > options: > > > > > > >>>>> > > > > > > >>>>> 'scan.watermark.emit.strategy'='on-event', > > > > > > >>>>> 'scan.watermark.emit.on-event.gap'='10000', > > > > > > >>>>> 'scan.idle-timeout'='1min', > > > > > > >>>>> 'scan.watermark.alignment.group'='alignment-group-1', > > > > > > >>>>> 'scan.watermark.alignment.max-drift'='1min', > > > > > > >>>>> 'scan.watermark.alignment.update-interval'='1s' > > > > > > >>>>> > > > > > > >>>>> I know that this makes the keys even longer, but given that > > > those > > > > > > >>>>> options are for power users this should be acceptable. It > > also > > > > > > clearly > > > > > > >>>>> indicates which options are for sinks, scans, and lookups. > > This > > > > > > >>>>> potentially also helps in allow lists. > > > > > > >>>>> > > > > > > >>>>> A hint to the implementation: I would suggest that we add > > those > > > > > > options > > > > > > >>>>> to `FactoryUtil`. All cross-connector options should end up > > > > there. > > > > > > >>>>> > > > > > > >>>>> Please also consider the changes to the CompiledPlan in > your > > > > FLIP. > > > > > > This > > > > > > >>>>> change has implications on the JSON format as watermark > > > strategy > > > > of > > > > > > >>>>> ExecNode becomes more complex, see WatermarkPushDownSpec > [4]. > > > > > > >>>>> > > > > > > >>>>> Regards, > > > > > > >>>>> Timo > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> [1] > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner > > > > > > >>>>> [2] > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee > > > > > > >>>>> [3] > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory > > > > > > >>>>> [4] > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>> On 24.02.23 04:55, kui yuan wrote: > > > > > > >>>>>> Hi all, > > > > > > >>>>>> > > > > > > >>>>>> I have updated the flip according to the discussion, and > we > > > will > > > > > > >>> extend > > > > > > >>>>> the > > > > > > >>>>>> watermark-related features with both table options and > > > 'OPTIONS' > > > > > > >>> hint, > > > > > > >>>>> like > > > > > > >>>>>> this: > > > > > > >>>>>> > > > > > > >>>>>> ``` > > > > > > >>>>>> -- configure in table options > > > > > > >>>>>> CREATE TABLE user_actions ( > > > > > > >>>>>> ... > > > > > > >>>>>> user_action_time TIMESTAMP(3), > > > > > > >>>>>> WATERMARK FOR user_action_time AS user_action_time - > > > > INTERVAL > > > > > > '5' > > > > > > >>>>> SECOND > > > > > > >>>>>> ) WITH ( > > > > > > >>>>>> 'watermark.emit.strategy'='ON_PERIODIC', > > > > > > >>>>>> ... > > > > > > >>>>>> ); > > > > > > >>>>>> > > > > > > >>>>>> -- use 'OPTIONS' hint > > > > > > >>>>>> select ... from source_table /*+ > > > > > OPTIONS('watermark.emit.strategy'= > > > > > > >>>>>> 'ON_PERIODIC') */ > > > > > > >>>>>> ``` > > > > > > >>>>>> > > > > > > >>>>>> Does everybody have any other questions? > > > > > > >>>>>> > > > > > > >>>>>> Best > > > > > > >>>>>> Kui Yuan > > > > > > >>>>>> > > > > > > >>>>>> kui yuan <catye...@gmail.com> 于2023年2月23日周四 20:05写道: > > > > > > >>>>>> > > > > > > >>>>>>> Hi all, > > > > > > >>>>>>> > > > > > > >>>>>>> Thanks for all suggestions. > > > > > > >>>>>>> > > > > > > >>>>>>> We will extend the watermark-related features in SQL > layer > > > with > > > > > > >>> dynamic > > > > > > >>>>>>> table options and 'OPTIONS' hint, just as everyone > > expects. I > > > > > will > > > > > > >>>>> modify > > > > > > >>>>>>> Flip-296 as discussed. > > > > > > >>>>>>> > > > > > > >>>>>>> @Martijn As far as I know, there is no hint interface in > > the > > > > > table > > > > > > >>> API, > > > > > > >>>>>>> so we can't use hint in table API directly. if we need to > > > > extend > > > > > > the > > > > > > >>>>> hint > > > > > > >>>>>>> interface in the table API, maybe we need another flip. > > > > However, > > > > > if > > > > > > >>> we > > > > > > >>>>>>> extend the watermark-related features in the dynamic > table > > > > > options, > > > > > > >>>>> maybe > > > > > > >>>>>>> we are able to use them indirectly in the table API like > > > > this[1]: > > > > > > >>>>>>> > > > > > > >>>>>>> ``` > > > > > > >>>>>>> // register a table named "Orders" > > > > > > >>>>>>> tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, > > > > product > > > > > > >>>>> STRING, > > > > > > >>>>>>> amount INT) WITH > > ('watermark.emit.strategy'='ON_EVENT'...)"); > > > > > > >>>>>>> ``` > > > > > > >>>>>>> > > > > > > >>>>>>> [1] > > > > > > >>>>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/ > > > > > > >>>>>>> > > > > > > >>>>>>> Best > > > > > > >>>>>>> Kui Yuan > > > > > > >>>>>>> > > > > > > >>>>>>> Yun Tang <myas...@live.com> 于2023年2月23日周四 17:46写道: > > > > > > >>>>>>> > > > > > > >>>>>>>> Thanks for the warm discussions! > > > > > > >>>>>>>> > > > > > > >>>>>>>> I had an offline discussion with Kui about the replies. > I > > > > think > > > > > I > > > > > > >>>>> could > > > > > > >>>>>>>> give some explanations on the original intention to > > > introduce > > > > > > >>> another > > > > > > >>>>>>>> WATERMARK_PARAMS. If we take a look at the current > > > datastream > > > > > API, > > > > > > >>> the > > > > > > >>>>>>>> watermark strategy does not belong to any specific > > > connector. > > > > > And > > > > > > >>> we > > > > > > >>>>>>>> thought the dynamic table options were more like the > > > > > > configurations > > > > > > >>>>> within > > > > > > >>>>>>>> some specific connector. > > > > > > >>>>>>>> > > > > > > >>>>>>>> From the review comments, I think most people feel > good > > to > > > > > make > > > > > > it > > > > > > >>>>> part > > > > > > >>>>>>>> of the dynamic table options. I think this is fine if we > > > give > > > > > more > > > > > > >>>>> clear > > > > > > >>>>>>>> scope definition of the dynamic table options here. And > I > > > also > > > > > > >>> agree > > > > > > >>>>> with > > > > > > >>>>>>>> Jingsong's concern about adding SQL syntax which is the > > most > > > > > > >>>>> concerning > > > > > > >>>>>>>> part before launching this discussion. > > > > > > >>>>>>>> > > > > > > >>>>>>>> For Martijn's concern, if we accept to make the > > > > > watermark-related > > > > > > >>>>> options > > > > > > >>>>>>>> part of dynamic table options, the problem becomes > another > > > > > topic: > > > > > > >>> how > > > > > > >>>>> to > > > > > > >>>>>>>> support the dynamic table options in table API, which is > > > > > deserved > > > > > > >>> to > > > > > > >>>>> create > > > > > > >>>>>>>> another FLIP. > > > > > > >>>>>>>> > > > > > > >>>>>>>> Best > > > > > > >>>>>>>> Yun Tang > > > > > > >>>>>>>> ________________________________ > > > > > > >>>>>>>> From: Martijn Visser <martijnvis...@apache.org> > > > > > > >>>>>>>> Sent: Thursday, February 23, 2023 17:14 > > > > > > >>>>>>>> To: dev@flink.apache.org <dev@flink.apache.org> > > > > > > >>>>>>>> Subject: Re: [DISCUSS] FLIP-296: Watermark options for > > table > > > > > API & > > > > > > >>> SQL > > > > > > >>>>>>>> > > > > > > >>>>>>>> Hi, > > > > > > >>>>>>>> > > > > > > >>>>>>>> While I can understand that there's a desire to first > > focus > > > on > > > > > > >>> solving > > > > > > >>>>>>>> this > > > > > > >>>>>>>> problem for SQL, I do wonder if we should ignore the > Table > > > API > > > > > at > > > > > > >>> this > > > > > > >>>>>>>> point. If we could include the syntax for the Table API, > > it > > > > > > >>>>> potentially > > > > > > >>>>>>>> could also be implemented by another contributor without > > > > needing > > > > > > to > > > > > > >>>>> create > > > > > > >>>>>>>> another FLIP. If we don't design it right now, my > concern > > is > > > > > that > > > > > > >>> this > > > > > > >>>>>>>> will > > > > > > >>>>>>>> increase sparsity for the Table API which ultimately > hurts > > > > > > >>> adoption. > > > > > > >>>>>>>> > > > > > > >>>>>>>> With regards to the syntax, I have a preference to solve > > > this > > > > > via > > > > > > >>> the > > > > > > >>>>>>>> connector options (e.g. like you can currently specify > > > things > > > > as > > > > > > >>>>>>>> scan.startup.specific-offsets or scan.bounded.mode for > the > > > > Kafka > > > > > > >>>>>>>> connector). You could still use the dynamic table > options > > to > > > > > > >>>>> override/add > > > > > > >>>>>>>> them. > > > > > > >>>>>>>> > > > > > > >>>>>>>> Best regards, > > > > > > >>>>>>>> > > > > > > >>>>>>>> Martijn > > > > > > >>>>>>>> > > > > > > >>>>>>>> On Thu, Feb 23, 2023 at 7:21 AM Shammon FY < > > > zjur...@gmail.com > > > > > > > > > > > >>> wrote: > > > > > > >>>>>>>> > > > > > > >>>>>>>>> Hi kui > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Thanks for your answer and +1 to yuxia too > > > > > > >>>>>>>>> > > > > > > >>>>>>>>>> we should not bind the watermark-related options to a > > > > > connector > > > > > > >>> to > > > > > > >>>>>>>> ensure > > > > > > >>>>>>>>> semantic clarity. > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> In my opinion, adding watermark-related options to a > > > > connector > > > > > is > > > > > > >>>>> much > > > > > > >>>>>>>> more > > > > > > >>>>>>>>> clear. Currently users can define simple watermark > > strategy > > > > in > > > > > > >>> DDL, > > > > > > >>>>>>>> adding > > > > > > >>>>>>>>> more configuration items in connector options is easy > to > > > > > > >>> understand > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> Best, > > > > > > >>>>>>>>> Shammon > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li < > > > > > > >>> jingsongl...@gmail.com > > > > > > >>>>>> > > > > > > >>>>>>>>> wrote: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>>> Thanks for your proposal. > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> +1 to yuxia, consider watermark-related hints as > option > > > > hints. > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> Personally, I am cautious about adding SQL syntax, > > > > > > >>> WATERMARK_PARAMS > > > > > > >>>>> is > > > > > > >>>>>>>>>> also SQL syntax to some extent. > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> We can use OPTIONS to meet this requirement if > possible. > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> Best, > > > > > > >>>>>>>>>> Jingsong > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>>> On Thu, Feb 23, 2023 at 10:41 AM yuxia < > > > > > > >>> luoyu...@alumni.sjtu.edu.cn > > > > > > >>>>>> > > > > > > >>>>>>>>>> wrote: > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Hi, Yuan Kui. > > > > > > >>>>>>>>>>> Thanks for driving it. > > > > > > >>>>>>>>>>> IMO, the 'OPTIONS' hint may be not only specific to > the > > > > > > >>> connector > > > > > > >>>>>>>>>> options. Just as a reference, we also have > > > > > `sink.parallelism`[1] > > > > > > >>> as > > > > > > >>>>> a > > > > > > >>>>>>>>>> connector options. It enables > > > > > > >>>>>>>>>>> user to specific the writer's parallelism dynamically > > > > > > per-query. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Personally, I perfer to consider watermark-related > > hints > > > as > > > > > > >>> option > > > > > > >>>>>>>>>> hints. So, user can define a default watermark > strategy > > > for > > > > > the > > > > > > >>>>> table, > > > > > > >>>>>>>>> and > > > > > > >>>>>>>>>> if user dosen't needed to changes it, they need to do > > > > nothing > > > > > in > > > > > > >>>>> their > > > > > > >>>>>>>>>> query instead of specific it ervery time. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> [1] > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Best regards, > > > > > > >>>>>>>>>>> Yuxia > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Best regards, > > > > > > >>>>>>>>>>> Yuxia > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> ----- 原始邮件 ----- > > > > > > >>>>>>>>>>> 发件人: "kui yuan" <catye...@gmail.com> > > > > > > >>>>>>>>>>> 收件人: "dev" <dev@flink.apache.org> > > > > > > >>>>>>>>>>> 抄送: "Jark Wu" <imj...@gmail.com> > > > > > > >>>>>>>>>>> 发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11 > > > > > > >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-296: Watermark options for > table > > > > API & > > > > > > >>> SQL > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Hi all, > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Thanks for the lively discussion and I will respond > to > > > > these > > > > > > >>>>>>>> questions > > > > > > >>>>>>>>>> one > > > > > > >>>>>>>>>>> by one. However, there are also some common questions > > > and I > > > > > > will > > > > > > >>>>>>>> answer > > > > > > >>>>>>>>>>> together. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> @郑 Thanks for your reply. The features mentioned in > > this > > > > flip > > > > > > >>> are > > > > > > >>>>>>>> only > > > > > > >>>>>>>>>> for > > > > > > >>>>>>>>>>> those source connectors that implement the > > > > > > >>>>> SupportsWatermarkPushDown > > > > > > >>>>>>>>>>> interface, generating watermarks in other graph > > locations > > > > is > > > > > > >>> not in > > > > > > >>>>>>>> the > > > > > > >>>>>>>>>>> scope of this discussion. Perhaps another flip can be > > > > > proposed > > > > > > >>>>>>>> later to > > > > > > >>>>>>>>>>> implement this feature. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> @Shammon Thanks for your reply. In Flip-296, a > rejected > > > > > > >>> alternative > > > > > > >>>>>>>> is > > > > > > >>>>>>>>>>> adding watermark related options in the connector > > > > options,we > > > > > > >>>>> believe > > > > > > >>>>>>>>> that > > > > > > >>>>>>>>>>> we should not bind the watermark-related options to a > > > > > connector > > > > > > >>> to > > > > > > >>>>>>>>> ensure > > > > > > >>>>>>>>>>> semantic clarity. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>>> What will happen if we add watermark related options > > in > > > > `the > > > > > > >>>>>>>>> connector > > > > > > >>>>>>>>>>>> options`? Will the connector ignore these options or > > > throw > > > > > an > > > > > > >>>>>>>>>> exception? > > > > > > >>>>>>>>>>>> How can we support this? > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> If user defines different watermark configurations > for > > > one > > > > > > >>> table in > > > > > > >>>>>>>> two > > > > > > >>>>>>>>>>> places, I tend to prefer the first place would > prevail, > > > but > > > > > we > > > > > > >>> can > > > > > > >>>>>>>>> also > > > > > > >>>>>>>>>>> throw exception or just print logs to prompt the > user, > > > > which > > > > > > are > > > > > > >>>>>>>>>>> implementation details. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>>> If one table is used by two operators with different > > > > > watermark > > > > > > >>>>>>>>> params, > > > > > > >>>>>>>>>>>> what will happen? > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> @Martijn Thanks for your reply. I'm sorry that we > are > > > not > > > > > > >>>>>>>> particularly > > > > > > >>>>>>>>>>> accurate, this hint is mainly for SQL, not table > API. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>>> While the FLIP talks about watermark options for > Table > > > > API & > > > > > > >>> SQL, > > > > > > >>>>>>>> I > > > > > > >>>>>>>>>> only > > > > > > >>>>>>>>>>>> see proposed syntax for SQL, not for the Table API. > > What > > > > is > > > > > > >>> your > > > > > > >>>>>>>>>> proposal > > > > > > >>>>>>>>>>>> for the Table API > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> @Jane Thanks for your reply. For the first question, > If > > > the > > > > > > user > > > > > > >>>>>>>> uses > > > > > > >>>>>>>>>> this > > > > > > >>>>>>>>>>> hint on those sourse that does not implement the > > > > > > >>>>>>>>>> SupportsWatermarkPushDown > > > > > > >>>>>>>>>>> interface, it will be completely invalid. The task > will > > > run > > > > > as > > > > > > >>>>>>>> normal > > > > > > >>>>>>>>> as > > > > > > >>>>>>>>>> if > > > > > > >>>>>>>>>>> the hint had not been used. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>>> What's the behavior if there are multiple table > > sources, > > > > > among > > > > > > >>>>>>>> which > > > > > > >>>>>>>>>>>> some do not support `SupportsWatermarkPushDown`? > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> @Jane feedback that 'WATERMARK_PARAMS' is difficult > to > > > > > > remember, > > > > > > >>>>>>>>> perhaps > > > > > > >>>>>>>>>>> the naming issue can be put to the end of the > > discussion, > > > > > > >>> because > > > > > > >>>>>>>> more > > > > > > >>>>>>>>>>> people like @Martijn @Shuo are considering whether > > these > > > > > > >>>>>>>> configurations > > > > > > >>>>>>>>>>> should be put into the DDL or the 'OPTIONS' hint. > > Here's > > > > > what I > > > > > > >>>>>>>>>>> think, Putting these configs into DDL or putting them > > > into > > > > > > >>>>> 'OPTIONS' > > > > > > >>>>>>>>> hint > > > > > > >>>>>>>>>>> is actually the same thing, because the 'OPTIONS' > hint > > is > > > > > > mainly > > > > > > >>>>>>>> used > > > > > > >>>>>>>>> to > > > > > > >>>>>>>>>>> configure the properties of conenctor. The reason > why I > > > > want > > > > > to > > > > > > >>> use > > > > > > >>>>>>>> a > > > > > > >>>>>>>>> new > > > > > > >>>>>>>>>>> hint is to make sure the semantics clear, in my > opinion > > > the > > > > > > >>>>>>>>> configuration > > > > > > >>>>>>>>>>> of watermark should not be mixed up with connector. > > > > However, > > > > > a > > > > > > >>> new > > > > > > >>>>>>>> hint > > > > > > >>>>>>>>>>> does make it more difficult to use to some extent, > for > > > > > example, > > > > > > >>>>>>>> when a > > > > > > >>>>>>>>>> user > > > > > > >>>>>>>>>>> uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. > > For > > > > > this > > > > > > >>>>>>>> point, > > > > > > >>>>>>>>>> maby > > > > > > >>>>>>>>>>> it is more appropriate to use uniform 'OPTIONS' hint. > > > > > > >>>>>>>>>>> On the other hand, if we enrich more watermark option > > > keys > > > > in > > > > > > >>>>>>>> 'OPTIONS' > > > > > > >>>>>>>>>>> hints, The question will be what we treat the > > > definatrions > > > > of > > > > > > >>>>>>>>> 'OPTIONS' > > > > > > >>>>>>>>>>> hint, is this only specific to the connector options > or > > > > could > > > > > > be > > > > > > >>>>>>>> more? > > > > > > >>>>>>>>>>> Maybe @Jark could share more insights here. In my > > opion, > > > > > > >>> 'OPTIONS' > > > > > > >>>>>>>> is > > > > > > >>>>>>>>>> only > > > > > > >>>>>>>>>>> related to the connector options, which is not like > the > > > > > gernal > > > > > > >>>>>>>>> watermark > > > > > > >>>>>>>>>>> options. > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>> Shuo Cheng <njucs...@gmail.com> 于2023年2月22日周三 > 19:17写道: > > > > > > >>>>>>>>>>> > > > > > > >>>>>>>>>>>> Hi Kui, > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> Thanks for driving the discussion. It's quite useful > > to > > > > > > >>> introduce > > > > > > >>>>>>>>>> Watermark > > > > > > >>>>>>>>>>>> options. I have some questions: > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> What kind of hints is "WATERMARK_PARAMS"? > > > > > > >>>>>>>>>>>> Currently, we have two kinds of hints in Flink: > > Dynamic > > > > > Table > > > > > > >>>>>>>>> Options & > > > > > > >>>>>>>>>>>> Query Hints. As described in the Flip, > > > "WATERMARK_PARAMS" > > > > is > > > > > > >>> more > > > > > > >>>>>>>>> like > > > > > > >>>>>>>>>>>> Dynamic Table Options. So two questions arise here: > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> 1) Are these watermark options to be exposed as > > > connector > > > > > > WITH > > > > > > >>>>>>>>>> options? Aa > > > > > > >>>>>>>>>>>> described in SQL Hints doc[1], "Dynamic Table > Options > > > > allow > > > > > > to > > > > > > >>>>>>>>>> specify or > > > > > > >>>>>>>>>>>> override table options dynamically", which implies > > that > > > > > these > > > > > > >>>>>>>> options > > > > > > >>>>>>>>>> can > > > > > > >>>>>>>>>>>> also be configured in WITH options. > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> 2) Do we really need a new hint name like > > > > > 'WATERMARK_PARAMS', > > > > > > >>>>>>>> table > > > > > > >>>>>>>>>>>> options use "OPTIONS" as hint name, like '/*+ > > > > > > >>>>>>>>>>>> OPTIONS('csv.ignore-parse-errors'='true') */', maybe > > we > > > > can > > > > > > >>> enrich > > > > > > >>>>>>>>> more > > > > > > >>>>>>>>>>>> table option keys for watermark, e.g., /*+ > > > > > > >>>>>>>>>>>> OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */. > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> [1] > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/ > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>> On Wed, Feb 22, 2023 at 10:22 AM kui yuan < > > > > > catye...@gmail.com > > > > > > > > > > > > > >>>>>>>>> wrote: > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> Hi devs, > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> I'd like to start a discussion thread for > > FLIP-296[1]. > > > > This > > > > > > >>>>>>>> comes > > > > > > >>>>>>>>>> from an > > > > > > >>>>>>>>>>>>> offline discussion with @Yun Tang, and we hope to > > > enrich > > > > > > table > > > > > > >>>>>>>> API > > > > > > >>>>>>>>> & > > > > > > >>>>>>>>>> SQL > > > > > > >>>>>>>>>>>> to > > > > > > >>>>>>>>>>>>> support many watermark-related features which were > > only > > > > > > >>>>>>>> implemented > > > > > > >>>>>>>>>> at > > > > > > >>>>>>>>>>>> the > > > > > > >>>>>>>>>>>>> datastream API level. > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> Basically, we want to introduce watermark options > in > > > > table > > > > > > >>> API & > > > > > > >>>>>>>>> SQL > > > > > > >>>>>>>>>> via > > > > > > >>>>>>>>>>>>> SQL hint named 'WATERMARK_PARAMS' to support > > features: > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> 1、Configurable watermark emit strategy > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> 2、Dealing with idle sources > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> 3、Watermark alignment > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> Last but not least, thanks to Qingsheng and Jing > > Zhang > > > > for > > > > > > the > > > > > > >>>>>>>>>> initial > > > > > > >>>>>>>>>>>>> reviews. > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> Looking forward to your thoughts and any feedback > is > > > > > > >>>>>>>> appreciated! > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> [1] > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405 > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> Best > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>>> Yuan Kui > > > > > > >>>>>>>>>>>>> > > > > > > >>>>>>>>>>>> > > > > > > >>>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>>>> > > > > > > >>>>>> > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >