Hi Timo,

Thanks for starting this discussion. I really like it!
The FLIP is already in good shape, I only have some minor comments.

1. Could we also support HASH and RANGE distribution kind on the DDL
syntax?
I noticed that HASH and UNKNOWN are introduced in the Java API, but not in
the syntax.

2. Can we make "INTO n BUCKETS" optional in CREATE TABLE and ALTER TABLE?
Some storage engines support automatically determining the bucket number
based on
the cluster resources and data size of the table. For example, StarRocks[1]
and Paimon[2].

Best,
Jark

[1]:
https://docs.starrocks.io/en-us/latest/table_design/Data_distribution#determine-the-number-of-buckets
[2]:
https://paimon.apache.org/docs/0.5/concepts/primary-key-table/#dynamic-bucket

On Thu, 26 Oct 2023 at 18:26, Jingsong Li <jingsongl...@gmail.com> wrote:

> Very thanks Timo for starting this discussion.
>
> Big +1 for this.
>
> The design looks good to me!
>
> We can add some documentation for connector developers. For example:
> for sink, If there needs some keyby, please finish the keyby by the
> connector itself. SupportsBucketing is just a marker interface.
>
> Best,
> Jingsong
>
> On Thu, Oct 26, 2023 at 5:00 PM Timo Walther <twal...@apache.org> wrote:
> >
> > Hi everyone,
> >
> > I would like to start a discussion on FLIP-376: Add DISTRIBUTED BY
> > clause [1].
> >
> > Many SQL vendors expose the concepts of Partitioning, Bucketing, and
> > Clustering. This FLIP continues the work of previous FLIPs and would
> > like to introduce the concept of "Bucketing" to Flink.
> >
> > This is a pure connector characteristic and helps both Apache Kafka and
> > Apache Paimon connectors in avoiding a complex WITH clause by providing
> > improved syntax.
> >
> > Here is an example:
> >
> > CREATE TABLE MyTable
> >    (
> >      uid BIGINT,
> >      name STRING
> >    )
> >    DISTRIBUTED BY (uid) INTO 6 BUCKETS
> >    WITH (
> >      'connector' = 'kafka'
> >    )
> >
> > The full syntax specification can be found in the document. The clause
> > should be optional and fully backwards compatible.
> >
> > Regards,
> > Timo
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
>

Reply via email to