[GitHub] [hudi] danny0405 commented on a diff in pull request #7009: [HUDI-5058]Fix flink catalog read spark table error : primary key col can not be nullable

GitBox Fri, 21 Oct 2022 17:45:31 -0700


danny0405 commented on code in PR #7009:
URL: https://github.com/apache/hudi/pull/7009#discussion_r1002288141



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -397,17 +399,22 @@ public CatalogBaseTable getTable(ObjectPath tablePath) 
throws TableNotExistExcep
     String path = hiveTable.getSd().getLocation();
     Map<String, String> parameters = hiveTable.getParameters();
     Schema latestTableSchema = StreamerUtil.getLatestTableSchema(path, 
hiveConf);
+    String pkColumnsStr = parameters.get(FlinkOptions.RECORD_KEY_FIELD.key());
+    List<String> pkColumns = StringUtils.isNullOrEmpty(pkColumnsStr)
+        ? null : StringUtils.split(pkColumnsStr, ",");
     org.apache.flink.table.api.Schema schema;
     if (latestTableSchema != null) {
+      // if the table is initialized from spark, the write schema is nullable 
for pk columns.
+      DataType tableDataType = DataTypeUtils.ensureColumnsAsNonNullable(

Review Comment:
   It is a common behavior: a column is by default nullable if user does not 
declare the nullability in DDL. And for primary keys, they must be forced as 
non-nullable.
   
   Flink would generate correct avro schema if the table was initialized from 
Flink app, what we fix here is a table created by Spark, so i guess, spark does 
not take the primary key constraint into nullability somewhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a diff in pull request #7009: [HUDI-5058]Fix flink catalog read spark table error : primary key col can not be nullable

Reply via email to