[
https://issues.apache.org/jira/browse/HIVE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611100#comment-14611100
]
Sergey Shelukhin edited comment on HIVE-11016 at 7/1/15 10:28 PM:
------------------------------------------------------------------
[~vikram.dixit] can you take a look at java change?
Issue appears to be that the first fetch in process() sets firstFetchHappened =
true before doing it, whereas in closeOp/joinFinalLeftData, it would call
fetchNextGroup first when there's > 1 small tables (because firstFetchHappened
is set only in the last iteration of the loop).
That would cause first fetch to be done again when operator calls itself thru
ReduceRecordSource and observes firstFetchDone is false; so, it would call
fetch on all small tables, including the one for which it was called by
joinFinalLeftData on the same callstack; that would set fetchDone to true for
this tag because it is exhausted, and then when recursion unwinds back to the
call made from joinFinalLeftData, it would be reset back to false.
I changed the logic to be the same in both places for the first fetch, added
warning log for the blind reset (such recursion shouldn't happen though, so I
added warning and didn't remove the reset).
Also after first fetch logic removal, the loop logic in joinFinalLeftData
became really strange... I replaced it with an if and left a TODO
was (Author: sershe):
[~vikram.dixit] can you take a look at java change?
Issue appears to be that the first fetch in process() sets firstFetchHappened =
true before doing it, whereas in closeOp/joinFinalLeftData, it would call
fetchNextGroup first when there's > 1 small tables (because firstFetchHappened
is set only in the last iteration of the loop).
That would cause first fetch to be done again when operator calls itself thru
ReduceRecordSource, fetchDone to be set to true for this source in that
recursive call because it was exhausted, and reset to false again in the
top-level fetchOneRow called (via some methods) from the first fetch logic at
joinFinalLeftData.
I changed the logic to be the same in both places for the first fetch, added
warning log for the blind reset (such recursion shouldn't happen though, so I
added warning and didn't remove the reset).
Also after first fetch logic removal, the loop logic in joinFinalLeftData
became really strange... I replaced it with an if and left a TODO
> LLAP: MiniTez mergejoin test fails with Tez input error
> -------------------------------------------------------
>
> Key: HIVE-11016
> URL: https://issues.apache.org/jira/browse/HIVE-11016
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-11016.patch
>
>
> Didn't spend a lot of time investigating, but from the code it looks like we
> shouldn't be calling it after false at least on this path (after false from
> next, pushRecord returns false, which causes fetchDone to be set for the tag;
> and fetchOneRow is not called if that is set; should be ok unless tags are
> messed up?)
> {noformat}
> 2015-06-15 17:28:17,272 ERROR [main]: SessionState
> (SessionState.java:printError(984)) - Vertex failed, vertexName=Reducer 2,
> vertexId=vertex_1434414363282_0002_17_03, diagnostics=[Task failed,
> taskId=task_1434414363282_0002_17_03_000002, diagnostics=[TaskAttempt 0
> failed, info=[Error: Failure while running task:
> attempt_1434414363282_0002_17_03_000002_0:java.lang.RuntimeException:
> java.lang.RuntimeException: Hive Runtime Error while closing operators:
> java.lang.RuntimeException: java.io.IOException: Please check if you are
> invoking moveToNext() even after it returned false.
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
> operators: java.lang.RuntimeException: java.io.IOException: Please check if
> you are invoking moveToNext() even after it returned false.
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:338)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
> ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.RuntimeException: java.io.IOException: Please check if you are
> invoking moveToNext() even after it returned false.
> at
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:412)
> at
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:380)
> at
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:449)
> at
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:389)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:651)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:314)
> ... 15 more
> Caused by: java.lang.RuntimeException: java.io.IOException: Please check if
> you are invoking moveToNext() even after it returned false.
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302)
> at
> org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404)
> ... 20 more
> Caused by: java.io.IOException: Please check if you are invoking moveToNext()
> even after it returned false.
> at
> org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:223)
> at
> org.apache.tez.runtime.library.common.ValuesIterator.moveToNext(ValuesIterator.java:105)
> at
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput$OrderedGroupedKeyValuesReader.next(OrderedGroupedKVInput.java:308)
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:260)
> ... 21 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)