[
https://issues.apache.org/jira/browse/DRILL-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532150#comment-16532150
]
salim achouche commented on DRILL-6517:
---------------------------------------
* I ran the query around 10 times and it succeeded each time (running in 29
minutes)
* Bounced the Drillbit cluster and immediately one of the nodes became
unresponsive
* I launched a script to gather jstacks each minute; somehow the jstack failed
and got the below kernel messages
* VMware blogs indicated the VM is running out of resources
* The interesting part is that the java illegal exception showed up again when
cancellation happened{color:#f79232}Caused by: java.lang.IllegalStateException:
Record count not set for this vector container{color}
{color:#FF0000}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color}
{color:#FF0000} kernel:NMI watchdog: BUG: soft lockup - CPU#6 stuck for 21s!
[java:12219] {color}
{color:#FF0000}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color}
{color:#FF0000} kernel:NMI watchdog: BUG: soft lockup - CPU#3 stuck for 25s!
[java:16991]{color}
{color:#FF0000}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color}
{color:#FF0000} kernel:NMI watchdog: BUG: soft lockup - CPU#4 stuck for 25s!
[java:17633]{color}
{color:#FF0000}Message from syslogd@mfs133 at Jul 3 18:48:27 ...{color}
{color:#FF0000} kernel:NMI watchdog: BUG: soft lockup - CPU#5 stuck for 25s!
[java:27059]{color}
> IllegalStateException: Record count not set for this vector container
> ---------------------------------------------------------------------
>
> Key: DRILL-6517
> URL: https://issues.apache.org/jira/browse/DRILL-6517
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.14.0
> Reporter: Khurram Faraaz
> Assignee: salim achouche
> Priority: Critical
> Fix For: 1.14.0
>
> Attachments: 24d7b377-7589-7928-f34f-57d02061acef.sys.drill
>
>
> TPC-DS query is Canceled after 2 hrs and 47 mins and we see an
> IllegalStateException: Record count not set for this vector container, in
> drillbit.log
> Steps to reproduce the problem, query profile
> (24d7b377-7589-7928-f34f-57d02061acef) is attached here.
> {noformat}
> In drill-env.sh set max direct memory to 12G on all 4 nodes in cluster
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"12G"}
> and set these options from sqlline,
> alter system set `planner.memory.max_query_memory_per_node` = 10737418240;
> alter system set `drill.exec.hashagg.fallback.enabled` = true;
> To run the query (replace IP-ADDRESS with your foreman node's IP address)
> cd /opt/mapr/drill/drill-1.14.0/bin
> ./sqlline -u
> "jdbc:drill:schema=dfs.tpcds_sf1_parquet_views;drillbit=<IP-ADDRESS>" -f
> /root/query72.sql
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2018-06-18 20:08:51,912 [24d7b377-7589-7928-f34f-57d02061acef:frag:4:49]
> ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR:
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> IllegalStateException: Record count not set for this vector container
> Fragment 4:49
> [Error Id: 73177a1c-f7aa-4c9e-99e1-d6e1280e3f27 on qa102-45.qa.lab:31010]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
> ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:361)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:216)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:327)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_161]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_161]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
> Caused by: java.lang.IllegalStateException: Record count not set for this
> vector container
> at com.google.common.base.Preconditions.checkState(Preconditions.java:173)
> ~[guava-18.0.jar:na]
> at
> org.apache.drill.exec.record.VectorContainer.getRecordCount(VectorContainer.java:394)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.getRecordCount(RemovingRecordBatch.java:49)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.RecordBatchSizer.<init>(RecordBatchSizer.java:690)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.RecordBatchSizer.<init>(RecordBatchSizer.java:662)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:73)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.JoinBatchMemoryManager.update(JoinBatchMemoryManager.java:79)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:242)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.sniffNonEmptyBatch(HashJoinBatch.java:276)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.prefetchFirstBatchFromBothSides(HashJoinBatch.java:238)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.HashJoinBatch.buildSchema(HashJoinBatch.java:218)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:137)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:152)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:137)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:172)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:294)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:281)
> ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_161]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_161]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> ~[hadoop-common-2.7.0-mapr-1707.jar:na]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:281)
> [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
> ... 4 common frames omitted
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)