qphien commented on a change in pull request #2052:
URL: https://github.com/apache/iceberg/pull/2052#discussion_r554041892



##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -66,9 +67,7 @@ public void initialize(@Nullable Configuration configuration, 
Properties serDePr
     assertNotVectorizedTez(configuration);
 
     Schema tableSchema;
-    if (configuration.get(InputFormatConfig.TABLE_SCHEMA) != null) {
-      tableSchema = 
SchemaParser.fromJson(configuration.get(InputFormatConfig.TABLE_SCHEMA));
-    } else if (serDeProperties.get(InputFormatConfig.TABLE_SCHEMA) != null) {
+    if (serDeProperties.get(InputFormatConfig.TABLE_SCHEMA) != null) {

Review comment:
       As show below:
   * Mapper handles input split belongs to one table: 
   
https://github.com/apache/hive/blob/113f6af7528f016bf821f7a746bad496cc93f834/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L406
   * Function **copyTableJobPropertiesToConf** copies 'iceberg.mr.table.schema' 
property to jobConf:
   
https://github.com/apache/hive/blob/113f6af7528f016bf821f7a746bad496cc93f834/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2427-L2443
    
   if we join two tables, only one iceberg schema exists in jobConf which leads 
wrong inspector:
   * empty inspector(non-overlap in two table selected columns, e.g. 
   `SELECT o.order_id, o.customer_id, o.total, p.name FROM default.orders o 
JOIN default.products p ON o.product_id = p.id ORDER BY o.order_id`
   selected columns in table default.orders: [order_id, total, customer_id, 
product_id]
   selected columns in table default.products: [id, name])
   * incomplete inspector(overlap in two table selected columns, e.g.
   `SELECT c.first_name, o.order_id FROM default.orders o JOIN 
default.customers c ON o.customer_id = c.customer_id ORDER BY o.order_id DESC`
   selected columns in table default.customers: [first_name, customer_id]
   selected columns in table default.orders: [order_id, customer_id]
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to