szlta commented on a change in pull request #3980:
URL: https://github.com/apache/iceberg/pull/3980#discussion_r794524784
##########
File path:
hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java
##########
@@ -55,17 +58,25 @@ public Type struct(Types.StructType expected, GroupType
struct, List<Type> field
for (Types.NestedField field : expectedFields) {
int id = field.fieldId();
- if (id != MetadataColumns.ROW_POSITION.fieldId() && id !=
MetadataColumns.IS_DELETED.fieldId()) {
- Type fieldInFileSchema = typesById.get(id);
- if (fieldInFileSchema == null) {
- // New field - not in this parquet file yet, add the new field name
instead of null
+ if (id == MetadataColumns.ROW_POSITION.fieldId() || id ==
MetadataColumns.IS_DELETED.fieldId()) {
+ continue;
+ }
+ Type fieldInPrunedFileSchema = typesById.get(id);
+ if (fieldInPrunedFileSchema == null) {
+ if (!originalFileSchema.containsField(field.name())) {
+ // Must be a new field - it isn't in this parquet file yet, so add
the new field name instead of null
appendToColNamesList(isMessageType, field.name());
} else {
- // Already present column in this parquet file, add the original name
- types.add(fieldInFileSchema);
- appendToColNamesList(isMessageType, fieldInFileSchema.getName());
+ // This field is found in the parquet file with a different ID, so
it must have been recreated since.
+ // Inserting a dummy col name to force Hive Parquet reader returning
null for this column.
+ appendToColNamesList(isMessageType, DUMMY_COL_NAME);
Review comment:
Got it ;) Good point actually, I checked this case and the reader
luckily isn't bothered by seeing more of the (same) dummy names in the column
list. For each such dummy, null values are returned to Hive which is correct
behaviour.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]