szlta commented on a change in pull request #3980:
URL: https://github.com/apache/iceberg/pull/3980#discussion_r794524784



##########
File path: 
hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java
##########
@@ -55,17 +58,25 @@ public Type struct(Types.StructType expected, GroupType 
struct, List<Type> field
 
     for (Types.NestedField field : expectedFields) {
       int id = field.fieldId();
-      if (id != MetadataColumns.ROW_POSITION.fieldId() && id != 
MetadataColumns.IS_DELETED.fieldId()) {
-        Type fieldInFileSchema = typesById.get(id);
-        if (fieldInFileSchema == null) {
-          // New field - not in this parquet file yet, add the new field name 
instead of null
+      if (id == MetadataColumns.ROW_POSITION.fieldId() || id == 
MetadataColumns.IS_DELETED.fieldId()) {
+        continue;
+      }
+      Type fieldInPrunedFileSchema = typesById.get(id);
+      if (fieldInPrunedFileSchema == null) {
+        if (!originalFileSchema.containsField(field.name())) {
+          // Must be a new field - it isn't in this parquet file yet, so add 
the new field name instead of null
           appendToColNamesList(isMessageType, field.name());
         } else {
-          // Already present column in this parquet file, add the original name
-          types.add(fieldInFileSchema);
-          appendToColNamesList(isMessageType, fieldInFileSchema.getName());
+          // This field is found in the parquet file with a different ID, so 
it must have been recreated since.
+          // Inserting a dummy col name to force Hive Parquet reader returning 
null for this column.
+          appendToColNamesList(isMessageType, DUMMY_COL_NAME);

Review comment:
       Got it ;) Good point actually, I checked this case and the reader 
luckily isn't bothered by seeing more of the (same) dummy names in the column 
list. For each such dummy, null values are returned to Hive which is correct 
behaviour.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to