dawidwys commented on issue #10213: [FLINK-12846][table-common] Carry primary key information in TableSchema URL: https://github.com/apache/flink/pull/10213#issuecomment-555056645 Hi @wuchong. First of all thank you for outlining the original purpose of this change. I am a bit hesitant though to introduce a new ground concept to a core part of the Table ecosystem without a good understanding of it. That's why I would appreciate an agreement within community. I do want to emphasize I definitely want to help with finding a solution that we can introduce in 1.10 that can improve the TPC-DS performance. I did a bit of research on the `PRIMARY & UNIQUE` keys. I have one concern regarding the semantics of a `PRIMARY KEY`. What I found out is that according to SQL standard & most of RDBM systems, `PRIMARY KEY` constraint enforces `NOT NULL` constraint on all columns that are part of the `PRIMARY KEY`. That's not the case for `UNIQUE` keys, but there it is assumed that null values are not equal regarding that constraint (you may have multiple rows having a null unique key). Unfortunately this is not reflected in Hive. In hive (at least pre 3.0) there was/is no support of `NOT NULL` types. Therefore `PRIMARY KEY` in hive is actually not a `PRIMARY KEY` in a SQL standard sense. I think just for that issue it would be good to bring it to the ML.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
