[ 
https://issues.apache.org/jira/browse/HIVE-11693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302701#comment-15302701
 ] 

Selina Zhang commented on HIVE-11693:
-------------------------------------

[~rajesh.balamohan], we hit the same issue recently. But I think the patch you 
attached did not fix the root problem. 

The issue is actually CommonMergeJoinOperator only set big table position when 
it has inputs for big table. 

{code:title=CommonMergeJoinOperator.java}
  @Override
  public void process(Object row, int tag) throws HiveException {
    posBigTable = (byte) conf.getBigTablePosition();
...
{code}

If the input is empty, the above method will not be called. In the query you 
listed, a subquery is involved. The generated table is tagged as 0, while the 
left table is tagged as 1.  GenTezWork.java set the big table position as 1 for 
both reduce work and CommonJoinOperator. In reduce phase, when 
ReduceRecordProcessor got executed, it retrieves the record from big table:

{code:title=ReduceRecordProcessor.java}
@Override
  void run() throws Exception {
....
    // run the operator pipeline
    while (sources[bigTablePosition].pushRecord()) {
    }
  }
{code}

The big table position here is 1. If the input from the big table is empty, 
this is the only place pushRecord() be called to read big table. However, 
because the CommonMergeJoinOperator missed set big table position, in closeOp() 
part, it will think tag 1 is small table, so another pushRecord() is called to 
retrieve table content. Then we see the exception listed in this JIRA. 

Please let me know if my analysis has problem. If you think it is correct, can 
you update the patch?

Thanks

> CommonMergeJoinOperator throws exception with tez
> -------------------------------------------------
>
>                 Key: HIVE-11693
>                 URL: https://issues.apache.org/jira/browse/HIVE-11693
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: HIVE-11693.1.patch
>
>
> Got this when executing a simple query with latest hive build + tez latest 
> version.
> {noformat}
> Error: Failure while running task: 
> attempt_1439860407967_0291_2_03_000045_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: 
> java.lang.RuntimeException: java.io.IOException: Please check if you are 
> invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators: java.lang.RuntimeException: java.io.IOException: Please check if 
> you are invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:316)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:162)
> ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.RuntimeException: java.io.IOException: Please check if you are 
> invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:412)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.doFirstFetchIfNeeded(CommonMergeJoinOperator.java:482)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:434)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:384)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:616)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:292)
> ... 15 more
> Caused by: java.lang.RuntimeException: java.io.IOException: Please check if 
> you are invoking moveToNext() even after it returned false.
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:291)
> at 
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:400)
> ... 21 more
> Caused by: java.io.IOException: Please check if you are invoking moveToNext() 
> even after it returned false.
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:223)
> at 
> org.apache.tez.runtime.library.common.ValuesIterator.moveToNext(ValuesIterator.java:105)
> at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput$OrderedGroupedKeyValuesReader.next(OrderedGroupedKVInput.java:308)
> at 
> org.apache.hadoop.hive.ql.exec.tez.KeyValuesFromKeyValues.next(KeyValuesFromKeyValues.java:46)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:249)
> ... 22 more
> {noformat}
> Not sure if this is related to HIVE-11016. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to