dawidwys commented on issue #10213: [FLINK-12846][table-common] Carry primary 
key information in TableSchema
URL: https://github.com/apache/flink/pull/10213#issuecomment-555056645
 
 
   Hi @wuchong.
   First of all thank you for outlining the original purpose of this change. I 
am a bit hesitant though to introduce a new ground concept to a core part of 
the Table ecosystem without a good understanding of it. That's why I would 
appreciate an agreement within community. I do want to emphasize I definitely 
want to help with finding a solution that we can introduce in 1.10 that can 
improve the TPC-DS performance.
   
   I did a bit of research on the `PRIMARY & UNIQUE` keys. I have one concern 
regarding the semantics of a `PRIMARY KEY`. What I found out is that according 
to SQL standard & most of RDBM systems, `PRIMARY KEY` constraint enforces `NOT 
NULL` constraint on all columns that are part of the `PRIMARY KEY`. That's not 
the case for `UNIQUE` keys, but there it is assumed that null values are not 
equal regarding that constraint (you may have multiple rows having a null 
unique key). Unfortunately this is not reflected in Hive. In hive (at least pre 
3.0) there was/is no support of `NOT NULL` types. Therefore `PRIMARY KEY` in 
hive is actually not a `PRIMARY KEY` in a SQL standard sense. I think just for 
that issue it would be good to bring it to the ML.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to