empcl commented on code in PR #11006:
URL: https://github.com/apache/hudi/pull/11006#discussion_r1565442379
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/Parquet2SparkSchemaUtils.java:
##########
@@ -140,7 +141,7 @@ private static String convertGroupField(GroupType field) {
ValidationUtils.checkArgument(field.getFieldCount() == 1, "Illegal
List type: " + field);
Type repeatedType = field.getType(0);
if (isElementType(repeatedType, field.getName())) {
- return arrayType(repeatedType, false);
+ return arrayType(repeatedType, true);
Review Comment:
Let me introduce the background of this question. The current Flink creates
a Hudi table containing array type fields, which defaults to array field
elements that cannot be nullable. However, when using Spark to read data from
the hive table and write it to the hudi table, the SparkSQL engine assumes that
array field elements can be nullable, resulting in inconsistencies during field
and type validation. The SparkSQL engine defaults that all fields can be
nullable, so I understand that when creating a table in Flink, it is possible
to directly specify that array type field elements can be nullable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]