empcl commented on code in PR #11006:
URL: https://github.com/apache/hudi/pull/11006#discussion_r1565442379


##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/Parquet2SparkSchemaUtils.java:
##########
@@ -140,7 +141,7 @@ private static String convertGroupField(GroupType field) {
         ValidationUtils.checkArgument(field.getFieldCount() == 1, "Illegal 
List type: " + field);
         Type repeatedType = field.getType(0);
         if (isElementType(repeatedType, field.getName())) {
-          return arrayType(repeatedType, false);
+          return arrayType(repeatedType, true);

Review Comment:
   Let me introduce the background of this question. The current Flink creates 
a Hudi table containing array type fields, which defaults to array field 
elements that cannot be nullable. However, when using Spark to read data from 
the hive table and write it to the hudi table, the SparkSQL engine assumes that 
array field elements can be nullable, resulting in inconsistencies during field 
and type validation. The SparkSQL engine defaults that all fields can be 
nullable, so I understand that when creating a table in Flink, it is possible 
to directly specify that array type field elements can be nullable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to