pvary commented on a change in pull request #2052:
URL: https://github.com/apache/iceberg/pull/2052#discussion_r553873950



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -82,7 +81,17 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
     }
 
     String[] selectedColumns = 
ColumnProjectionUtils.getReadColumnNames(configuration);
-    Schema projectedSchema = selectedColumns.length > 0 ? 
tableSchema.select(selectedColumns) : tableSchema;
+    // When same table is joined multiple times, it is possible some selected 
columns are duplicated,
+    // in this case wrong recordStructField position leads wrong value or 
ArrayIndexOutOfBoundException
+    String[] distinctSelectedColumns = 
Arrays.stream(selectedColumns).distinct().toArray(String[]::new);
+    Schema projectedSchema = distinctSelectedColumns.length > 0 ?
+            tableSchema.select(distinctSelectedColumns) : tableSchema;
+    // the input split mapper handles does not belong to this table
+    // it is necessary to ensure projectedSchema equals to tableSchema,
+    // or we cannot find selectOperator's column from inspector
+    if (projectedSchema.columns().size() != distinctSelectedColumns.length) {
+      projectedSchema = tableSchema;
+    }

Review comment:
       @marton-bod: Could you please take a look? You know more about the 
schema projection.
   Thanks,
   Peter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to