[
https://issues.apache.org/jira/browse/DRILL-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535448#comment-16535448
]
Boaz Ben-Zvi commented on DRILL-6453:
-------------------------------------
The "main" major fragment has 9 Hash-Join ops. With the recent "early sniffing"
code change, lower Hash-Joins have to actually start executing and probing,
while the higher ones are just building their schema.
Seems that the failure (DRILL-6517) happened at a fragment where HJ #2
finished "sniffing" (i.e., by calling next() once on both sides). Then at other
fragments (where #2 was still sniffing) the bottom Hash-Join (#9) was reading
from unordered receiver, and hanging on that read:
{code:java}
"24c0363c-22d7-ce4b-e248-0da49342192f:frag:4:115" #156 daemon prio=10 os_prio=0
tid=0x00007f8b21d80800 nid=0x3be9 waiting on condition [0x00007f8b0cff7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000781c2b768> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
at
java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
at
org.apache.drill.exec.work.batch.UnlimitedRawBatchBuffer$UnlimitedBufferQueue.take(UnlimitedRawBatchBuffer.java:61)
at
org.apache.drill.exec.work.batch.BaseRawBatchBuffer.getNext(BaseRawBatchBuffer.java:170)
at
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getNextBatch(UnorderedReceiverBatch.java:145)
at
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.next(UnorderedReceiverBatch.java:163)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.test.generated.HashJoinProbeGen25.executeProbePhase(HashJoinProbeTemplate.java:242)
at
org.apache.drill.exec.test.generated.HashJoinProbeGen25.probeAndProject(HashJoinProbeTemplate.java:393)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:348)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.test.generated.HashJoinProbeGen25.executeProbePhase(HashJoinProbeTemplate.java:242)
at
org.apache.drill.exec.test.generated.HashJoinProbeGen25.probeAndProject(HashJoinProbeTemplate.java:393)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:348)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:274)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:236)
at
org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:216)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:199)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
at
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:147)
at
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
at
org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
at
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294)
at
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:281)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1633)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:281)
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
> TPC-DS query 72 has regressed
> -----------------------------
>
> Key: DRILL-6453
> URL: https://issues.apache.org/jira/browse/DRILL-6453
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.14.0
> Reporter: Khurram Faraaz
> Assignee: Boaz Ben-Zvi
> Priority: Blocker
> Fix For: 1.14.0
>
> Attachments: 24f75b18-014a-fb58-21d2-baeab5c3352c.sys.drill
>
>
> TPC-DS query 72 seems to have regressed, query profile for the case where it
> Canceled after 2 hours on Drill 1.14.0 is attached here.
> {noformat}
> On, Drill 1.14.0-SNAPSHOT
> commit : 931b43e (TPC-DS query 72 executed successfully on this commit, took
> around 55 seconds to execute)
> SF1 parquet data on 4 nodes;
> planner.memory.max_query_memory_per_node = 10737418240.
> drill.exec.hashagg.fallback.enabled = true
> TPC-DS query 72 executed successfully & took 47 seconds to complete execution.
> {noformat}
> {noformat}
> TPC-DS data in the below run has date values stored as DATE datatype and not
> VARCHAR type
> On, Drill 1.14.0-SNAPSHOT
> commit : 82e1a12
> SF1 parquet data on 4 nodes;
> planner.memory.max_query_memory_per_node = 10737418240.
> drill.exec.hashagg.fallback.enabled = true
> and
> alter system set `exec.hashjoin.num_partitions` = 1;
> TPC-DS query 72 executed for 2 hrs and 11 mins and did not complete, I had to
> Cancel it by stopping the Foreman drillbit.
> As a result several minor fragments are reported to be in
> CANCELLATION_REQUESTED state on UI.
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)