Hi all,

After some offline discussion with Dawid, I changed my mind. We still agree
on that
we should only support reading from tables with primary key, forbid writing
to such table.

But the enforce flag should be introduced with primary key, and probably
should be
`NOR ENFORCE` be default. The problem with making `ENFORCE` by default is
when
creating a table with primary key constraint. If we follow the SQL standard
here, we should
do the validity check during the creating operation, and this will
introduce a really big
burden here and might make supporting this feature not feasible.

Best,
Kurt


On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <[email protected]> wrote:

> Hi Dawid,
>
> Actually I'm not 100% sure about both choices: always enforce primary key
> by Flink and not enforce at all.
>
> As you said, always enforce primary key is hard to achieve since Flink
> doesn't
> own data most of the time. But with some certain cases, it's actually
> doable, like
> writing a new table in a batch job, or if we have upsert flag supported in
> message
> queue. So it's hard to say Flink has no chance to enforce primary key
> here.
>
> On the other hand, not enforce primary key indeed save us a lot of work,
> but also
> would cause confusing sometimes. Especially when the table is written and
> read
> by Flink only. For example, some user created a table with primary key not
> enforced,
> and they will use it as source and sink table. This situation is quite
> confusing, because
> Flink didn't enforce primary key when writing, so it's hard to say we can
> rely on such
> information during read. It looks we are contradictory with ourself.
>
> AFAIK, most databases will only do validity check only when writing to
> table, and
> simply trust such information during read. I think we could also follow
> this way. Since
> we are not sure yet what kind of behavior when writing to a table with
> primary key, I would
> suggest to defer this util we start to talk about the source and sink
> behavior with primary
> key, maybe as well as introducing update, delete flag.
>
> To summarize, my proposal would be:
> 1. Add primary key but not introducing enforce flag now
> 2. Table with primary key can't be used as sink, and we will simply trust
> pk info during read.
>
> Best,
> Kurt
>
>
> On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <[email protected]>
> wrote:
>
>> Hi Kurt,
>>
>> You are right that this is included in the SQL standard.
>>
>> The problem is we are not a database and we do not own the data. There
>> is no way for us to truly enforce the uniqueness constraint. We are
>> always writing to some external systems that, moreover we are not an
>> exclusive produce/consumer for that data. We cannot ensure that external
>> systems will not corrupt the data.
>>
>> The way RDBM systems ensure uniqueness constraint is that they keep the
>> index of all keys for a table that they check against for every write.
>> This works only if the system that maintains the index is the only
>> writer to that table.
>>
>> Moreover some of the systems also allow loosening this restriction e.g.
>>
>> DB2:
>>
>> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
>>
>> HIVE(with a non standard syntax, they do it for the same reasons as
>> listed above):
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
>>
>>
>> What do you think?
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 20/11/2019 13:51, Kurt Young wrote:
>> > Hi all,
>> >
>> > Thanks Dawid for bringing this up, this is very important property of
>> SQL
>> > and
>> > I'm big +1 to have this.
>> >
>> > But before the discussion going deeply, I want to first mention that
>> > according to
>> > SQL standard, any unique key constraint which including primary key,
>> should
>> > be
>> > always be enforced.
>> >
>> > Please see:
>> >
>> > 4.18.3.1 Introduction to table constraints
>> >
>> > "NOTE 50 — A unique constraint is always enforced. A table check
>> constraint
>> > or
>> > a referential constraint can either be enforced or not enforced."
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <
>> [email protected]>
>> > wrote:
>> >
>> >> Hey,
>> >>
>> >> After an offline discussion I updated the FLIP slightly. In the initial
>> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
>> >> into the statement paranthesis. I also moved the primary key to the
>> >> TableSchema class.
>> >>
>> >> I also extended sections about UNIQUE key support and how do the
>> primary
>> >> keys affect sources and sinks.
>> >>
>> >> Please have another look. If there are no further concerns, I would
>> like
>> >> to start the vote soon.
>> >>
>> >> Best,
>> >>
>> >> Dawid
>> >>
>> >> On 20/11/2019 04:32, Jark Wu wrote:
>> >>> Hi Dawid,
>> >>>
>> >>> Thanks for driving this. Primary key is a very important and useful
>> >> feature
>> >>> to Flink Table API / SQL.
>> >>>
>> >>> I'm +1 to the general design.
>> >>>
>> >>> And have some thoughts as following:
>> >>>
>> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the DDL
>> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
>> >>>     What will we do when user specify an ENFORCED mode?  Shall we
>> forbid
>> >>> it?
>> >>>
>> >>> 2. > "When creating a table, creating a primary key constraint will
>> alter
>> >>> the columns nullability."
>> >>>     I think we should force the columns to be declared NOT NULL,
>> instead
>> >> of
>> >>> change the columns nullability implicitly.
>> >>>     Otherwise, it may be confused what is the nullbility of the
>> primary
>> >> key
>> >>> columns.
>> >>>
>> >>> 3. > "Support for Unique key is not part of the FLIP. It is just
>> >> mentioned
>> >>> to show how can we extend the primary key concept with more
>> constraints
>> >> in
>> >>> the future."
>> >>>     I think we can include Unique Key as part of the FLIP, as it is
>> >> already
>> >>> stated clearly. We can put the implemenation task of unique key in a
>> >> lower
>> >>> priority.
>> >>>
>> >>> 4. > Method for retrieving primary key constraint is in
>> CatalogBaseTable.
>> >>>     In SQL standard, primary key is declared in line or out of line.
>> If
>> >> it
>> >>> is out of line, it is still in the schema part, i.e. besides column
>> >>> definitions and in the parentheses.
>> >>>     So I think maybe primary key should belong to `TableSchema`
>> becuase
>> >> it
>> >>> is a part of schema.
>> >>>
>> >>>     CREATE TABLE xxx (
>> >>>        a int,
>> >>>        b varchar,
>> >>>        c bigint,
>> >>>        primary key (a, b)
>> >>>     ) with (...)
>> >>>
>> >>>
>> >>> Best,
>> >>> Jark
>> >>>
>> >>>
>> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <
>> [email protected]>
>> >>> wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I wanted to bring up the topic of primary key constraints that we
>> could
>> >>>> leverage for runtime optimizations. Please have a look at the
>> proposal
>> >>>> and I would especially draw your attention to the topic of
>> nullability
>> >>>> of columns that are part of a primary key. Some time in the future we
>> >>>> will also be able to leverage it for upserting streaming tables.
>> >>>>
>> >>>> Here is the proposal:
>> >>>>
>> >>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
>> >>>> Best,
>> >>>>
>> >>>> Dawid
>> >>>>
>> >>>>
>> >>>>
>> >>
>>
>>

Reply via email to