[ https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535448#comment-16535448 ]
Boaz Ben-Zvi commented on DRILL-6453: ------------------------------------- The "main" major fragment has 9 Hash-Join ops. With the recent "early sniffing" code change, lower Hash-Joins have to actually start executing and probing, while the higher ones are just building their schema. Seems that the failure (DRILL-6517) happened at a fragment where HJ #2 finished "sniffing" (i.e., by calling next() once on both sides). Then at other fragments (where #2 was still sniffing) the bottom Hash-Join (#9) was reading from unordered receiver, and hanging on that read: {code:java} "24c0363c-22d7-ce4b-e248-0da49342192f:frag:4:115" #156 daemon prio=10 os_prio=0 tid=0x00007f8b21d80800 nid=0x3be9 waiting on condition [0x00007f8b0cff7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000781c2b768> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492) at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) at org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61) at org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170) at org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:145) at org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:163) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.test.generated.HashJoinProbeGen25.executeProbePhase(HashJoinProbeTemplate.java:242) at org.apache.drill.exec.test.generated.HashJoinProbeGen25.probeAndProject(HashJoinProbeTemplate.java:393) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:348) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.test.generated.HashJoinProbeGen25.executeProbePhase(HashJoinProbeTemplate.java:242) at org.apache.drill.exec.test.generated.HashJoinProbeGen25.probeAndProject(HashJoinProbeTemplate.java:393) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:348) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236) at org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:199) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103) at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:281) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1633) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:281) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} > TPC-DS query 72 has regressed > ----------------------------- > > Key: DRILL-6453 > URL: https://issues.apache.org/jira/browse/DRILL-6453 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.14.0 > Reporter: Khurram Faraaz > Assignee: Boaz Ben-Zvi > Priority: Blocker > Fix For: 1.14.0 > > Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill > > > TPC-DS query 72 seems to have regressed, query profile for the case where it > Canceled after 2 hours on Drill 1.14.0 is attached here. > {noformat} > On, Drill 1.14.0-SNAPSHOT > commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took > around 55 seconds to execute) > SF1 parquet data on 4 nodes; > planner.memory.max_query_memory_per_node = 10737418240. > drill.exec.hashagg.fallback.enabled = true > TPC-DS query 72 executed successfully & took 47 seconds to complete execution. > {noformat} > {noformat} > TPC-DS data in the below run has date values stored as DATE datatype and not > VARCHAR type > On, Drill 1.14.0-SNAPSHOT > commit : 82e1a12 > SF1 parquet data on 4 nodes; > planner.memory.max_query_memory_per_node = 10737418240. > drill.exec.hashagg.fallback.enabled = true > and > alter system set `exec.hashjoin.num_partitions` = 1; > TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to > Cancel it by stopping the Foreman drillbit. > As a result several minor fragments are reported to be in > CANCELLATION_REQUESTED state on UI. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)