[ 
https://issues.apache.org/jira/browse/HIVE-12552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-12552:
------------------------------------
    Attachment: HIVE-12552.2.patch

Addressed review comments from [~hagleitn]

With 2.0f, it was generating 1009 tasks and most of them were not getting 
enough data; which could have been handled with less tasks. Got around 11-13% 
improvement with less number of tasks in llap mode (attached images show 
container mode for debugging purpose).  Haven't changed bytes per reducer in my 
run, which could bring down the number of reduce tasks.



> Wrong number of reducer estimation causing job to fail
> ------------------------------------------------------
>
>                 Key: HIVE-12552
>                 URL: https://issues.apache.org/jira/browse/HIVE-12552
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: 6_plan.txt, HIVE-12552.1.patch, HIVE-12552.2.patch, 
> With_max_partition_0.5_setting.png, with_default_setting.png
>
>
> {noformat}
> ], TaskAttempt 3 failed, info=[Error: Failure while running task: 
> attempt_1448429572030_1812_1_03_000029_3:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: 
> java.io.IOException: Illegal partition for 01 6c 6f 61 6e 20 61 63 63 6f 75 
> 6e 74 00 01 80 1f e1 d7 ff (-1)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:195)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:348)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.io.IOException: Illegal partition for 01 6c 6f 61 6e 20 61 63 
> 63 6f 75 6e 74 00 01 80 1f e1 d7 ff (-1)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:341)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
>       ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Illegal partition for 01 6c 6f 61 6e 20 61 63 63 6f 75 
> 6e 74 00 01 80 1f e1 d7 ff (-1)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:852)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.writeSingleRow(VectorGroupByOperator.java:904)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.access$400(VectorGroupByOperator.java:59)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.flush(VectorGroupByOperator.java:469)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.close(VectorGroupByOperator.java:375)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:950)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:656)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:670)
>       at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:670)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:318)
>       ... 15 more
> Caused by: java.io.IOException: Illegal partition for 01 6c 6f 61 6e 20 61 63 
> 63 6f 75 6e 74 00 01 80 1f e1 d7 ff (-1)
>       at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:379)
>       at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:357)
>       at 
> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:163)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:232)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:538)
>       at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385)
>       ... 25 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
> killedTasks:277, Vertex vertex_1448429572030_1812_1_03 [Reducer 2] 
> killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
> VERTEX_FAILURE. failedVertices:1 killedVertices:0
> {noformat}
> Env: master branch.
> Map 1 <- Map 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 5 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> set hive.tez.max.partition.factor=0.5f;
> This causes "Reducer 3" to have 0 tasks, causing the job to fail after 
> reducer 2. 
> Will attach the plan and screenshot shortly 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to