pvary commented on a change in pull request #3980:
URL: https://github.com/apache/iceberg/pull/3980#discussion_r794511997
##########
File path:
hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java
##########
@@ -55,17 +58,25 @@ public Type struct(Types.StructType expected, GroupType
struct, List<Type> field
for (Types.NestedField field : expectedFields) {
int id = field.fieldId();
- if (id != MetadataColumns.ROW_POSITION.fieldId() && id !=
MetadataColumns.IS_DELETED.fieldId()) {
- Type fieldInFileSchema = typesById.get(id);
- if (fieldInFileSchema == null) {
- // New field - not in this parquet file yet, add the new field name
instead of null
+ if (id == MetadataColumns.ROW_POSITION.fieldId() || id ==
MetadataColumns.IS_DELETED.fieldId()) {
+ continue;
+ }
+ Type fieldInPrunedFileSchema = typesById.get(id);
+ if (fieldInPrunedFileSchema == null) {
+ if (!originalFileSchema.containsField(field.name())) {
+ // Must be a new field - it isn't in this parquet file yet, so add
the new field name instead of null
appendToColNamesList(isMessageType, field.name());
} else {
- // Already present column in this parquet file, add the original name
- types.add(fieldInFileSchema);
- appendToColNamesList(isMessageType, fieldInFileSchema.getName());
+ // This field is found in the parquet file with a different ID, so
it must have been recreated since.
+ // Inserting a dummy col name to force Hive Parquet reader returning
null for this column.
+ appendToColNamesList(isMessageType, DUMMY_COL_NAME);
Review comment:
to -> two 😄
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]