[
https://issues.apache.org/jira/browse/HIVE-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737335#comment-14737335
]
Vikram Dixit K commented on HIVE-11606:
---------------------------------------
There is an optimization where in case of inner joins, if the hashtable is
empty, we set the done flag for the operator. However, this causes bucket map
joins to produce incorrect results in case of container reuse because the
operators in the cached work do not process records when the done flag has been
set even though a different bucket is being processed. We prevent caching of
the input in the case of bucket map joins but not the work - which makes sense
because the operator pipeline hasn't changed. Ideally, we should reset the done
flag only in the case of bucket map joins but this is not a big issue for
broadcast joins because we will run the previous optimization again anyways and
stop processing early in the initialize operator (loadHashTable) phase itself.
> Bucket map joins fail at hash table construction time
> -----------------------------------------------------
>
> Key: HIVE-11606
> URL: https://issues.apache.org/jira/browse/HIVE-11606
> Project: Hive
> Issue Type: Bug
> Components: Tez
> Affects Versions: 1.0.1, 1.2.1
> Reporter: Vikram Dixit K
> Assignee: Vikram Dixit K
> Attachments: HIVE-11606.1.patch, HIVE-11606.2.patch,
> HIVE-11606.3.patch
>
>
> {code}
> info=[Error: Failure while running task:java.lang.RuntimeException:
> java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a
> power of two
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
> at
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity
> must be a power of two
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
> at
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)