qphien commented on a change in pull request #2052:
URL: https://github.com/apache/iceberg/pull/2052#discussion_r554041892
##########
File path: mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java
##########
@@ -66,9 +67,7 @@ public void initialize(@Nullable Configuration configuration,
Properties serDePr
assertNotVectorizedTez(configuration);
Schema tableSchema;
- if (configuration.get(InputFormatConfig.TABLE_SCHEMA) != null) {
- tableSchema =
SchemaParser.fromJson(configuration.get(InputFormatConfig.TABLE_SCHEMA));
- } else if (serDeProperties.get(InputFormatConfig.TABLE_SCHEMA) != null) {
+ if (serDeProperties.get(InputFormatConfig.TABLE_SCHEMA) != null) {
Review comment:
As show below:
* Mapper handles input split belongs to one table:
https://github.com/apache/hive/blob/113f6af7528f016bf821f7a746bad496cc93f834/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L406
* Function **copyTableJobPropertiesToConf** copies 'iceberg.mr.table.schema'
property to jobConf:
https://github.com/apache/hive/blob/113f6af7528f016bf821f7a746bad496cc93f834/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2427-L2443
if we join two tables, only one iceberg schema exists in jobConf which leads
wrong inspector:
* empty inspector(non-overlap in two table selected columns, e.g.
`SELECT o.order_id, o.customer_id, o.total, p.name FROM default.orders o
JOIN default.products p ON o.product_id = p.id ORDER BY o.order_id`
selected columns in table default.orders: [order_id, total, customer_id,
product_id]
selected columns in table default.products: [id, name])
* incomplete inspector(overlap in two table selected columns, e.g.
`SELECT c.first_name, o.order_id FROM default.orders o JOIN
default.customers c ON o.customer_id = c.customer_id ORDER BY o.order_id DESC`
selected columns in table default.customers: [first_name, customer_id]
selected columns in table default.orders: [order_id, customer_id]
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]