[ https://issues.apache.org/jira/browse/HIVE-11606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737335#comment-14737335 ]
Vikram Dixit K commented on HIVE-11606: --------------------------------------- There is an optimization where in case of inner joins, if the hashtable is empty, we set the done flag for the operator. However, this causes bucket map joins to produce incorrect results in case of container reuse because the operators in the cached work do not process records when the done flag has been set even though a different bucket is being processed. We prevent caching of the input in the case of bucket map joins but not the work - which makes sense because the operator pipeline hasn't changed. Ideally, we should reset the done flag only in the case of bucket map joins but this is not a big issue for broadcast joins because we will run the previous optimization again anyways and stop processing early in the initialize operator (loadHashTable) phase itself. > Bucket map joins fail at hash table construction time > ----------------------------------------------------- > > Key: HIVE-11606 > URL: https://issues.apache.org/jira/browse/HIVE-11606 > Project: Hive > Issue Type: Bug > Components: Tez > Affects Versions: 1.0.1, 1.2.1 > Reporter: Vikram Dixit K > Assignee: Vikram Dixit K > Attachments: HIVE-11606.1.patch, HIVE-11606.2.patch, > HIVE-11606.3.patch > > > {code} > info=[Error: Failure while running task:java.lang.RuntimeException: > java.lang.RuntimeException: java.lang.AssertionError: Capacity must be a > power of two > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) > at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.RuntimeException: java.lang.AssertionError: Capacity > must be a power of two > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)