tprelle edited a comment on issue #2541: URL: https://github.com/apache/iceberg/issues/2541#issuecomment-831888173
@marton-bod sure : For tez from apache 0.10.0 tag i add - https://issues.apache.org/jira/projects/TEZ/issues/TEZ-4238 - https://issues.apache.org/jira/projects/TEZ/issues/TEZ-4264 For hive it was a bit complex from HDP 3.1.5-2-4 versions i add : - https://issues.apache.org/jira/browse/HIVE-23190 for be able to go to tez 0.10 - https://issues.apache.org/jira/browse/HIVE-24629 for output committer classe - https://issues.apache.org/jira/browse/HIVE-24207 because i need that hive tez processor fill jobconf https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java#L202 for TEZ_VERTEX_ID_HIVE in order to make TaskAttemptWrapper https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/TezUtil.java#L95 With this version i add still an issue : with this line https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java#L382 Because conf.getNumReduceTasks() and conf.getNumMapTasks() was never setup by Hive. I found a way (but i do not know if it's the correct one or it's because of HDP fork) to fix. - For ReduceWork plan, i add at this line https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L800 ` conf.setNumReduceTasks(reduceWork.isAutoReduceParallelism() ? reduceWork.getMaxReduceTasks() : reduceWork.getNumReduceTasks());` - For MergeJoinWork i add at this line https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L596 `conf.setNumMapTasks(mapWorkList.size() + 1);` - For MapWork, i was able only in one condition, if hive.compute.splits.in.am=false by adding at https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L716 `conf.setNumMapTasks(numTasks);` But with hive.compute.splits.in.am=false vectorisation it's not longer working because row ids a not longer projected. I need to set me up an hive from latest 3.1 version in order to be able to test. I take as example Apache code as Cloudera deside to remove from internet Hortonworks github but it's seems it's almost the same code from apache branch 3.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
