marton-bod commented on a change in pull request #2052:
URL: https://github.com/apache/iceberg/pull/2052#discussion_r554892044
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -82,7 +81,17 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
}
String[] selectedColumns =
ColumnProjectionUtils.getReadColumnNames(configuration);
- Schema projectedSchema = selectedColumns.length > 0 ?
tableSchema.select(selectedColumns) : tableSchema;
+ // When same table is joined multiple times, it is possible some selected
columns are duplicated,
+ // in this case wrong recordStructField position leads wrong value or
ArrayIndexOutOfBoundException
+ String[] distinctSelectedColumns =
Arrays.stream(selectedColumns).distinct().toArray(String[]::new);
+ Schema projectedSchema = distinctSelectedColumns.length > 0 ?
+ tableSchema.select(distinctSelectedColumns) : tableSchema;
+ // the input split mapper handles does not belong to this table
+ // it is necessary to ensure projectedSchema equals to tableSchema,
+ // or we cannot find selectOperator's column from inspector
+ if (projectedSchema.columns().size() != distinctSelectedColumns.length) {
+ projectedSchema = tableSchema;
+ }
Review comment:
Looks good generally, but I wanted to clarify this comment:
` // the input split mapper handles does not belong to this table
// it is necessary to ensure projectedSchema equals to tableSchema,
// or we cannot find selectOperator's column from inspector
`
Just for my understanding, can you give an example in what scenario we could
face this issue where the Schema.select() gives back a different number of
columns?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]