Hi all,
Thanks for all discussions!
Anyone else have questions or suggestions? if not, I will start a vote
thread later.
Best
Kui Yuan
kui yuan <catye...@gmail.com> 于2023年2月27日周一 20:21写道:
Hi Timo,
Thanks for your advice. I totally agree with your suggestion of naming
convention, I will rename these options and update the flip later,
thanks
very much.
In our internal implementation we had put these options inside the
`FactoryUtil`, just as you expect. We have also taken into account the
changes to the CompiledPlan and we have packaged these options
appropriately to minimize intrusiveness and ensure the compatibility to
the
`WatermarkPushDownSpec`.
A hint to the implementation: I would suggest that we add those
options
to `FactoryUtil`. All cross-connector options should end up there.
Please also consider the changes to the CompiledPlan in your FLIP.
This
change has implications on the JSON format as watermark strategy of
ExecNode becomes more complex, see WatermarkPushDownSpec
Best
Kui Yuan
Timo Walther <twal...@apache.org> 于2023年2月27日周一 18:05写道:
Hi Kui Yuan,
thanks for working on this FLIP. Let me also give some comments about
the proposed changes.
I support the direction of this FLIP about handling these
watermark-specific properties through options and /*+OPTIONS(...) */
hints.
Regarding naming, I would like to keep the options in sync with
existing
options:
> 'watermark.emit.strategy'='ON_EVENT'
Let's use lower case (e.g. `on-event`) that matches with properties
like
sink.partitioner [1] or sink.delivery-guarantee [2].
> 'source.idle-timeout'='1min'
According to FLIP-122 [3], we want to prefix all scan-source related
properties with `scan.*`. This clearly includes idle-timeout and
actually also watermark strategies which don't apply for lookup
sources.
Summarizing the comments above, we should use the following options:
'scan.watermark.emit.strategy'='on-event',
'scan.watermark.emit.on-event.gap'='10000',
'scan.idle-timeout'='1min',
'scan.watermark.alignment.group'='alignment-group-1',
'scan.watermark.alignment.max-drift'='1min',
'scan.watermark.alignment.update-interval'='1s'
I know that this makes the keys even longer, but given that those
options are for power users this should be acceptable. It also clearly
indicates which options are for sinks, scans, and lookups. This
potentially also helps in allow lists.
A hint to the implementation: I would suggest that we add those options
to `FactoryUtil`. All cross-connector options should end up there.
Please also consider the changes to the CompiledPlan in your FLIP. This
change has implications on the JSON format as watermark strategy of
ExecNode becomes more complex, see WatermarkPushDownSpec [4].
Regards,
Timo
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html#sink-partitioner
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/kafka/#sink-delivery-guarantee
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
[4]
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/abilities/source/WatermarkPushDownSpec.java
On 24.02.23 04:55, kui yuan wrote:
Hi all,
I have updated the flip according to the discussion, and we will
extend
the
watermark-related features with both table options and 'OPTIONS'
hint,
like
this:
```
-- configure in table options
CREATE TABLE user_actions (
...
user_action_time TIMESTAMP(3),
WATERMARK FOR user_action_time AS user_action_time - INTERVAL '5'
SECOND
) WITH (
'watermark.emit.strategy'='ON_PERIODIC',
...
);
-- use 'OPTIONS' hint
select ... from source_table /*+ OPTIONS('watermark.emit.strategy'=
'ON_PERIODIC') */
```
Does everybody have any other questions?
Best
Kui Yuan
kui yuan <catye...@gmail.com> 于2023年2月23日周四 20:05写道:
Hi all,
Thanks for all suggestions.
We will extend the watermark-related features in SQL layer with
dynamic
table options and 'OPTIONS' hint, just as everyone expects. I will
modify
Flip-296 as discussed.
@Martijn As far as I know, there is no hint interface in the table
API,
so we can't use hint in table API directly. if we need to extend the
hint
interface in the table API, maybe we need another flip. However, if
we
extend the watermark-related features in the dynamic table options,
maybe
we are able to use them indirectly in the table API like this[1]:
```
// register a table named "Orders"
tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product
STRING,
amount INT) WITH ('watermark.emit.strategy'='ON_EVENT'...)");
```
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/
Best
Kui Yuan
Yun Tang <myas...@live.com> 于2023年2月23日周四 17:46写道:
Thanks for the warm discussions!
I had an offline discussion with Kui about the replies. I think I
could
give some explanations on the original intention to introduce
another
WATERMARK_PARAMS. If we take a look at the current datastream API,
the
watermark strategy does not belong to any specific connector. And
we
thought the dynamic table options were more like the configurations
within
some specific connector.
From the review comments, I think most people feel good to make it
part
of the dynamic table options. I think this is fine if we give more
clear
scope definition of the dynamic table options here. And I also
agree
with
Jingsong's concern about adding SQL syntax which is the most
concerning
part before launching this discussion.
For Martijn's concern, if we accept to make the watermark-related
options
part of dynamic table options, the problem becomes another topic:
how
to
support the dynamic table options in table API, which is deserved
to
create
another FLIP.
Best
Yun Tang
________________________________
From: Martijn Visser <martijnvis...@apache.org>
Sent: Thursday, February 23, 2023 17:14
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: Re: [DISCUSS] FLIP-296: Watermark options for table API &
SQL
Hi,
While I can understand that there's a desire to first focus on
solving
this
problem for SQL, I do wonder if we should ignore the Table API at
this
point. If we could include the syntax for the Table API, it
potentially
could also be implemented by another contributor without needing to
create
another FLIP. If we don't design it right now, my concern is that
this
will
increase sparsity for the Table API which ultimately hurts
adoption.
With regards to the syntax, I have a preference to solve this via
the
connector options (e.g. like you can currently specify things as
scan.startup.specific-offsets or scan.bounded.mode for the Kafka
connector). You could still use the dynamic table options to
override/add
them.
Best regards,
Martijn
On Thu, Feb 23, 2023 at 7:21 AM Shammon FY <zjur...@gmail.com>
wrote:
Hi kui
Thanks for your answer and +1 to yuxia too
we should not bind the watermark-related options to a connector
to
ensure
semantic clarity.
In my opinion, adding watermark-related options to a connector is
much
more
clear. Currently users can define simple watermark strategy in
DDL,
adding
more configuration items in connector options is easy to
understand
Best,
Shammon
On Thu, Feb 23, 2023 at 10:52 AM Jingsong Li <
jingsongl...@gmail.com
wrote:
Thanks for your proposal.
+1 to yuxia, consider watermark-related hints as option hints.
Personally, I am cautious about adding SQL syntax,
WATERMARK_PARAMS
is
also SQL syntax to some extent.
We can use OPTIONS to meet this requirement if possible.
Best,
Jingsong
On Thu, Feb 23, 2023 at 10:41 AM yuxia <
luoyu...@alumni.sjtu.edu.cn
wrote:
Hi, Yuan Kui.
Thanks for driving it.
IMO, the 'OPTIONS' hint may be not only specific to the
connector
options. Just as a reference, we also have `sink.parallelism`[1]
as
a
connector options. It enables
user to specific the writer's parallelism dynamically per-query.
Personally, I perfer to consider watermark-related hints as
option
hints. So, user can define a default watermark strategy for the
table,
and
if user dosen't needed to changes it, they need to do nothing in
their
query instead of specific it ervery time.
[1]
https://nightlies.apache.org/flink/flink-docs-master/zh/docs/connectors/table/filesystem/#sink-parallelism
Best regards,
Yuxia
Best regards,
Yuxia
----- 原始邮件 -----
发件人: "kui yuan" <catye...@gmail.com>
收件人: "dev" <dev@flink.apache.org>
抄送: "Jark Wu" <imj...@gmail.com>
发送时间: 星期三, 2023年 2 月 22日 下午 10:08:11
主题: Re: [DISCUSS] FLIP-296: Watermark options for table API &
SQL
Hi all,
Thanks for the lively discussion and I will respond to these
questions
one
by one. However, there are also some common questions and I will
answer
together.
@郑 Thanks for your reply. The features mentioned in this flip
are
only
for
those source connectors that implement the
SupportsWatermarkPushDown
interface, generating watermarks in other graph locations is
not in
the
scope of this discussion. Perhaps another flip can be proposed
later to
implement this feature.
@Shammon Thanks for your reply. In Flip-296, a rejected
alternative
is
adding watermark related options in the connector options,we
believe
that
we should not bind the watermark-related options to a connector
to
ensure
semantic clarity.
What will happen if we add watermark related options in `the
connector
options`? Will the connector ignore these options or throw an
exception?
How can we support this?
If user defines different watermark configurations for one
table in
two
places, I tend to prefer the first place would prevail, but we
can
also
throw exception or just print logs to prompt the user, which are
implementation details.
If one table is used by two operators with different watermark
params,
what will happen?
@Martijn Thanks for your reply. I'm sorry that we are not
particularly
accurate, this hint is mainly for SQL, not table API.
While the FLIP talks about watermark options for Table API &
SQL,
I
only
see proposed syntax for SQL, not for the Table API. What is
your
proposal
for the Table API
@Jane Thanks for your reply. For the first question, If the user
uses
this
hint on those sourse that does not implement the
SupportsWatermarkPushDown
interface, it will be completely invalid. The task will run as
normal
as
if
the hint had not been used.
What's the behavior if there are multiple table sources, among
which
some do not support `SupportsWatermarkPushDown`?
@Jane feedback that 'WATERMARK_PARAMS' is difficult to remember,
perhaps
the naming issue can be put to the end of the discussion,
because
more
people like @Martijn @Shuo are considering whether these
configurations
should be put into the DDL or the 'OPTIONS' hint. Here's what I
think, Putting these configs into DDL or putting them into
'OPTIONS'
hint
is actually the same thing, because the 'OPTIONS' hint is mainly
used
to
configure the properties of conenctor. The reason why I want to
use
a
new
hint is to make sure the semantics clear, in my opinion the
configuration
of watermark should not be mixed up with connector. However, a
new
hint
does make it more difficult to use to some extent, for example,
when a
user
uses both 'OPTIONS' hint and 'WATERMARK_PARAMS' hint. For this
point,
maby
it is more appropriate to use uniform 'OPTIONS' hint.
On the other hand, if we enrich more watermark option keys in
'OPTIONS'
hints, The question will be what we treat the definatrions of
'OPTIONS'
hint, is this only specific to the connector options or could be
more?
Maybe @Jark could share more insights here. In my opion,
'OPTIONS'
is
only
related to the connector options, which is not like the gernal
watermark
options.
Shuo Cheng <njucs...@gmail.com> 于2023年2月22日周三 19:17写道:
Hi Kui,
Thanks for driving the discussion. It's quite useful to
introduce
Watermark
options. I have some questions:
What kind of hints is "WATERMARK_PARAMS"?
Currently, we have two kinds of hints in Flink: Dynamic Table
Options &
Query Hints. As described in the Flip, "WATERMARK_PARAMS" is
more
like
Dynamic Table Options. So two questions arise here:
1) Are these watermark options to be exposed as connector WITH
options? Aa
described in SQL Hints doc[1], "Dynamic Table Options allow to
specify or
override table options dynamically", which implies that these
options
can
also be configured in WITH options.
2) Do we really need a new hint name like 'WATERMARK_PARAMS',
table
options use "OPTIONS" as hint name, like '/*+
OPTIONS('csv.ignore-parse-errors'='true') */', maybe we can
enrich
more
table option keys for watermark, e.g., /*+
OPTIONS('watermark.emit-strategy'='ON_PERIODIC') */.
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql/queries/hints/
On Wed, Feb 22, 2023 at 10:22 AM kui yuan <catye...@gmail.com>
wrote:
Hi devs,
I'd like to start a discussion thread for FLIP-296[1]. This
comes
from an
offline discussion with @Yun Tang, and we hope to enrich table
API
&
SQL
to
support many watermark-related features which were only
implemented
at
the
datastream API level.
Basically, we want to introduce watermark options in table
API &
SQL
via
SQL hint named 'WATERMARK_PARAMS' to support features:
1、Configurable watermark emit strategy
2、Dealing with idle sources
3、Watermark alignment
Last but not least, thanks to Qingsheng and Jing Zhang for the
initial
reviews.
Looking forward to your thoughts and any feedback is
appreciated!
[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240884405
Best
Yuan Kui