[
https://issues.apache.org/jira/browse/HIVE-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841809#comment-16841809
]
Jason Dere commented on HIVE-21746:
-----------------------------------
I believe the dynamically partitioned hash join has issues when the join keys
are constant folded.
Looking at the ReduceSink output that feeds into the dynamically partitioned
hash join:
{noformat}
Reduce Output Operator
key expressions: _col20 (type: string), 'HR3' (type:
string)
null sort order: aa
sort order: ++
Map-reduce partition columns: _col20 (type: string),
'HR3' (type: string)
Statistics: Num rows: 3800000 Data size: 1288485344
Basic stats: COMPLETE Column stats: PARTIAL
tag: 0
value expressions: _col2 (type: timestamp), _col3
(type: timestamp), _col51 (type: timestamp), _col124 (type: timestamp)
{noformat}
So the value expressions in the ReduceSink consists of 4 timestamp columns. And
it appears that the data written out and sent to the Join also matches that.
However, the input schema to the MapJoin operator shows 5 columns rather than 4:
{noformat}
*** valCols[0] for JOIN JOIN_13: [Column[VALUE._col2], Column[VALUE._col3],
Column[KEY.reducesinkkey1], Column[VALUE._col49], Column[VALUE._col122]]
{noformat}
With types (timestamp, timestamp, string, timestamp, timestamp)
Note that the third column in this list is KEY.reducesinkkey1. Key columns
should have been filtered out from the values columns in
MapJoinProcessor.getMapJoinDesc(), during the section that populates
valueTableDescs.
But the keyExprMap generated by ExprNodeDescUtils.resolveJoinKeysAsRSColumns(),
which is only done for dynamically partitioned hash join, does not properly
match the KEY.reducesinkkey1 column from the ReduceSinkOperator, when filtering
the key columns from the value columns.
The column reference generated from the constant folded column, in keyExprMap:
{noformat}
1 = {ExprNodeColumnDesc@9714} "Column[KEY.reducesinkkey1]"
column = "KEY.reducesinkkey1"
tabAlias = ""
isPartitionColOrVirtualCol = false
isSkewedCol = false
typeInfo = {PrimitiveTypeInfo@9719} "string"
{noformat}
What should have been the corresponding key in the ReduceSinkOperator:
{noformat}
expr = {ExprNodeColumnDesc@8704} "Column[KEY.reducesinkkey1]"
column = "KEY.reducesinkkey1"
tabAlias = "t2"
isPartitionColOrVirtualCol = true
isSkewedCol = false
typeInfo = {PrimitiveTypeInfo@9719} "string"
{noformat}
The difference is the ReduceSinkOperator key has tabAlias = "t2". The one
generated by ExprNodeDescUtils.resolveJoinKeysAsRSColumns() currently has a
tabAlias hardcoded to "".
One solution is for ExprNodeConstantDesc to keep a foldedFromTab for the table
alias, in addition to foldedFromCol which it already has. That way
ExprNodeDescUtils.resolveJoinKeysAsRSColumns() can generate a column reference
with the same matching tableAlias as its parent ReduceSinkOperator.
> ArrayIndexOutOfBoundsException during dynamically partitioned hash join, with
> CBO disabled
> ------------------------------------------------------------------------------------------
>
> Key: HIVE-21746
> URL: https://issues.apache.org/jira/browse/HIVE-21746
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Jason Dere
> Assignee: Jason Dere
> Priority: Major
>
> ArrayIndexOutOfBounds exception during query execution with dynamically
> partitioned hash join.
> Found on Hive 2.x. Seems to occur with CBO disabled/failed.
> Disabling constant propagation seems to allow the query to succeed.
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 203
> at
> org.apache.hadoop.hive.serde2.io.TimestampWritable.getTotalLength(TimestampWritable.java:217)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:205)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getFieldsAsList(LazyBinaryStruct.java:281)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.unpack(MapJoinBytesTableContainer.java:744)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.next(MapJoinBytesTableContainer.java:730)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinBytesTableContainer$ReusableRowContainer.next(MapJoinBytesTableContainer.java:605)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.next(UnwrapRowContainer.java:70)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.persistence.UnwrapRowContainer.next(UnwrapRowContainer.java:34)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:819)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:924)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:456)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:359)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:290)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
> ~[hive-exec-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:377)
> ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[?:1.8.0_112]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> ~[hadoop-common-2.7.3.2.6.4.119-3.jar:?]
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> ~[tez-runtime-internals-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
> at
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> ~[tez-common-0.8.4.2.6.4.119-3.jar:0.8.4.2.6.4.119-3]
> at
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
> ~[hive-llap-server-2.1.0.2.6.4.119-3.jar:2.1.0.2.6.4.119-3]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_112]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_112]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_112]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)