liubo1022126 opened a new pull request #2614: URL: https://github.com/apache/iceberg/pull/2614
Issue: https://github.com/apache/iceberg/issues/2567 Run hive Sql in hive-shell. Table A left join Table B. > select * from (select * from ta)p1 left join (select id,name,age from tb) p2 on p1.id=p2.id limit 10; Regardless of whether Table A and Table B are in iceberg format or not, The amount of data in the table on the right is relatively large, Some map operator initialization failed. I find that the code `String[] selectedColumns = ColumnProjectionUtils.getReadColumnNames(configuration)` in class HiveIcebergSerDe get selectedColumns value from hconf by hive.io.file.readcolumn.names, But it does not correspond to the current map sometimes. Maybe someone realized this problem before, so there is some notes and code below: > // the input split mapper handles does not belong to this table // it is necessary to ensure projectedSchema equals to tableSchema, // or we cannot find selectOperator's column from inspector if (projectedSchema.columns().size() != distinctSelectedColumns.length) { projectedSchema = tableSchema; } But it is not enough at some case. eg: Table ta also have column [name] and column [age], which are the select column in Table tb. I debug and notice that when the above situation occurs, `serDeProperties.getProperty("columns")` corresponds to the schema columns of the current map, and `configuration.get("schema.evolution.columns")` corresponds to the schema columns of another. So I compare them to verify and it running ok. **But I'm not sure if these are enough, Can someone please help to check?** ------------------------- And I found that there is another way to fix these problem, and I think this way is the best. But we also need to code hive. With hive, in class org.apache.hadoop.hive.ql.exec.MapOperator, we can get need columns from `((TableScanOperator) conf.getAliasToWork().get(alias)).getConf().getNeededColumns()`, and set it in hconf `public void setChildren(Configuration hconf)` use a property px. Then in class org.apache.iceberg.mr.hive.HiveIcebergSerDe in iceberg, we can get need columns from property px correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
