[
https://issues.apache.org/jira/browse/DRILL-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chun Chang closed DRILL-2907.
-----------------------------
Assignee: Chun Chang (was: Jacques Nadeau)
Ran same test on 1.0 GA build for over 72 hours. No memory/resource issues
noticed.
> Drill performance degrades significantly over time - resource leak
> ------------------------------------------------------------------
>
> Key: DRILL-2907
> URL: https://issues.apache.org/jira/browse/DRILL-2907
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 0.9.0
> Reporter: Chun Chang
> Assignee: Chun Chang
> Priority: Critical
> Fix For: 1.0.0
>
> Attachments: drill.jstack, drill.native_memory
>
>
> #Generated by Git-Commit-Id-Plugin
> #Mon Apr 27 23:19:19 EDT 2015
> git.commit.id.abbrev=57a96d2
> [email protected]
> git.commit.message.full=DRILL-2083\: Fix bug in merging receiver\n
> git.commit.id=57a96d200e12c0efcad3f3ca9d935c42647234b1
> Running Functional/Passing/complex suit in a loop, on a four node cluster,
> with cmd:
> for i in `seq 1 1000`; do echo $i; time ./run.sh -s
> Functional/Passing/complex/json -g smoke,functional,regression -t 400 -n
> 10;sleep 1;done
> Initially, one iteration took about 1 min, but after 300 iterations, it took
> about 38 min, considerably longer, and about 10% tests timeout.
> Drillbit memory consumption increased from less than 1G initially to more
> than 5G. Even after I stopped the test, drillbit CPU still spinning.
> {code}
> Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
> Cpu(s): 11.5%us, 0.0%sy, 0.0%ni, 88.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 49416696k total, 28344300k used, 21072396k free, 218004k buffers
> Swap: 52428792k total, 0k used, 52428792k free, 1343096k cached
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 9889 mapr 20 0 7360m 5.1g 37m S 275.7 10.9 2077:45 java
> {code}
> I see the following exception in drill log:
> {code}
> 2015-04-29 09:29:54,679 [2abefae4-f2f5-6db8-c60d-6bfb30d2fcbf:frag:0:0] INFO
> o.a.drill.exec.work.foreman.Foreman - foreman cleaning up.
> 2015-04-29 09:29:54,713 [2abefa81-6079-5166-aeab-e9fd3d450af6:frag:0:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2abefa81-6079-5166-aeab-e9fd3d450af6:0:0: State change requested from
> CANCELLATION_REQUESTED --> FAILED for
> 2015-04-29 09:29:54,717 [2abefa81-6079-5166-aeab-e9fd3d450af6:frag:0:0] ERROR
> o.a.d.c.e.DrillRuntimeException - SYSTEM ERROR: You tried to do a batch data
> read operation when you were in a state of STOP. You can only do this type
> of operation when you are in a state of OK or OK_NEW_SCHEMA.
> Fragment 0:0
> [468261de-df41-48fa-89f7-55a9cf23fc49 on qa-node120.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: You tried to
> do a batch data read operation when you were in a state of STOP. You can
> only do this type of operation when you are in a state of OK or OK_NEW_SCHEMA.
> Fragment 0:0
> [468261de-df41-48fa-89f7-55a9cf23fc49 on qa-node120.qa.lab:31010]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:465)
> ~[drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:248)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:218)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_45]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalStateException: You tried to do a batch data read
> operation when you were in a state of STOP. You can only do this type of
> operation when you are in a state of OK or OK_NEW_SCHEMA.
> at
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.validateReadState(IteratorValidatorBatchIterator.java:54)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.getValueVectorId(IteratorValidatorBatchIterator.java:103)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath(ExpressionTreeMaterializer.java:423)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.expr.ExpressionTreeMaterializer$MaterializeVisitor.visitSchemaPath(ExpressionTreeMaterializer.java:188)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.common.expression.SchemaPath.accept(SchemaPath.java:160)
> ~[drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize(ExpressionTreeMaterializer.java:104)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.expr.ExpressionTreeMaterializer.materialize(ExpressionTreeMaterializer.java:90)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.MergeJoinBatch.materializeExpression(MergeJoinBatch.java:547)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.MergeJoinBatch.generateNewWorker(MergeJoinBatch.java:372)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.join.MergeJoinBatch.innerNext(MergeJoinBatch.java:188)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:144)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:101)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:144)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:101)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:91)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:144)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:74)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:64)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:198)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:192)
> ~[drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.7.0_45]
> at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_45]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469)
> ~[hadoop-common-2.4.1-mapr-1408.jar:na]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:192)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> ... 4 common frames omitted
> 2015-04-29 09:29:54,721 [2abefa81-6079-5166-aeab-e9fd3d450af6:frag:0:0] INFO
> o.a.drill.exec.work.foreman.Foreman - State change requested.
> CANCELLATION_REQUESTED --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: You
> tried to do a batch data read operation when you were in a state of STOP.
> You can only do this type of operation when you are in a state of OK or
> OK_NEW_SCHEMA.
> Fragment 0:0
> [468261de-df41-48fa-89f7-55a9cf23fc49 on qa-node120.qa.lab:31010]
> at
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:409)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.foreman.QueryManager$RootStatusReporter.statusChange(QueryManager.java:389)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:90)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.AbstractStatusReporter.fail(AbstractStatusReporter.java:86)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:252)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:218)
> [drill-java-exec-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-0.9.0-SNAPSHOT-rebuffed.jar:0.9.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> [na:1.7.0_45]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-04-29 09:29:54,722 [2abefa81-6079-5166-aeab-e9fd3d450af6:frag:0:0] INFO
> o.a.drill.exec.work.foreman.Foreman - foreman cleaning up.
> 2015-04-29 09:29:54,738 [2abefa81-6079-5166-aeab-e9fd3d450af6:frag:0:0] INFO
> o.a.drill.exec.work.foreman.Foreman - State change requested. CANCELED -->
> COMPLETED
> 2015-04-29 09:29:54,738 [2abefa81-6079-5166-aeab-e9fd3d450af6:frag:0:0] WARN
> o.a.drill.exec.work.foreman.Foreman - Dropping request to move to COMPLETED
> state as query is already at CANCELED state (which is terminal).
> 2015-04-29 09:29:54,760 [2abefa80-9f9b-00de-4655-c4620c86096a:frag:0:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 2abefa80-9f9b-00de-4655-c4620c86096a:0:0: State change requested from
> CANCELLATION_REQUESTED --> FINISHED for
> 2015-04-29 09:29:54,760 [2abefa80-9f9b-00de-4655-c4620c86096a:frag:0:0] INFO
> o.a.d.e.w.f.AbstractStatusReporter - State changed for
> 2abefa80-9f9b-00de-4655-c4620c86096a:0:0. New state: CANCELLED
> 2015-04-29 09:29:54,760 [2abefa80-9f9b-00de-4655-c4620c86096a:frag:0:0] INFO
> o.a.drill.exec.work.foreman.Foreman - State change requested.
> CANCELLATION_REQUESTED --> COMPLETED
> 2015-04-29 09:29:54,761 [2abefa80-9f9b-00de-4655-c4620c86096a:frag:0:0] INFO
> o.a.drill.exec.work.foreman.Foreman - foreman cleaning up.
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)