[GitHub] [iceberg] tprelle edited a comment on issue #2541: Hive: insert into from hive tez it's not working for Map Only insert query


tprelle edited a comment on issue #2541:
URL: https://github.com/apache/iceberg/issues/2541#issuecomment-831888173



   @marton-bod  sure : 
   For tez from apache 0.10.0 tag i add 
   -  https://issues.apache.org/jira/projects/TEZ/issues/TEZ-4238 
   - https://issues.apache.org/jira/projects/TEZ/issues/TEZ-4264
   
   For hive it was a bit complex from HDP 3.1.5-2-4 versions i add :
    - https://issues.apache.org/jira/browse/HIVE-23190 for be able to go to tez 
0.10
    - https://issues.apache.org/jira/browse/HIVE-24629 for output committer 
classe
    - https://issues.apache.org/jira/browse/HIVE-24207 because i need that hive 
tez processor fill jobconf 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java#L202
 for TEZ_VERTEX_ID_HIVE in order to make TaskAttemptWrapper 
https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/TezUtil.java#L95
    
   With this version i add still an issue : with this line 
https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java#L382
   Because conf.getNumReduceTasks() and conf.getNumMapTasks() was never setup 
by Hive.
   I found a way (but i do not know if it's the correct one or it's because of 
HDP fork) to fix.
   
   - For  ReduceWork plan, i add at this line
   
    
https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L800
   `    conf.setNumReduceTasks(reduceWork.isAutoReduceParallelism() ?
               reduceWork.getMaxReduceTasks() :
               reduceWork.getNumReduceTasks());`
   
   - For MergeJoinWork i add at this line 
https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L596
 `conf.setNumMapTasks(mapWorkList.size() + 1);`
   
   - For MapWork, i was able only in one condition, if 
hive.compute.splits.in.am=false by adding at  
https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L716
 `conf.setNumMapTasks(numTasks);`
   But with hive.compute.splits.in.am=false vectorisation it's not longer 
working because row ids a not longer projected.
   
   I need to set me up an hive from latest 3.1 version in order to be able to 
test.
   I take as example Apache code as Cloudera deside to remove from internet 
Hortonworks github but it's seems it's almost the same code from apache branch 
3.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] tprelle edited a comment on issue #2541: Hive: insert into from hive tez it's not working for Map Only insert query

Reply via email to