[ 
https://issues.apache.org/jira/browse/DRILL-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010961#comment-17010961
 ] 

Nitin Pawar commented on DRILL-7517:
------------------------------------

[~volodymyr], I am afraid I can not download the data files to local as that's 
customer data. 

These queries run perfectly fine on Drill-1.13 version which we are using 
currently. Issue related to OOM comes only for 1.16

Current guideline of having HEAP of 4 to 8 GB from the documentation, I kept it 
to 12GB.

For now I have shutdown the 1.16 instances of drill as it had major impact on 
our production infra.

Our current load of queries will be maximum 100 queries on drill cluster 
parallel at any given point of time.  There might be other queries such as 
"explain plan extended" to know more details of the queries but I am not sure 
if that causes lots of issues as they are not processing any data as the plan 
we always run against cp.`employee.json`

 

I will try to get heapdump and share it with you tomorrow

> Drill 1.16.0 shuts down frequently
> ----------------------------------
>
>                 Key: DRILL-7517
>                 URL: https://issues.apache.org/jira/browse/DRILL-7517
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Nitin Pawar
>            Priority: Critical
>
> We see following exception every few hours
> Our drillbit cluster queries data from S3. The only queries we make to web 
> interface are for explain plan and no actual query goes via WEB UI. 
> here is the full exception
> 2020-01-07 16:34:02,922 [qtp80683229-962] INFO 
> o.a.d.exec.server.rest.QueryWrapper - User Error Occurred: There is not 
> enough heap memory to run this query using the web interface. 
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: There is 
> not enough heap memory to run this query using the web interface. 
> Please try a query with fewer columns or with a filter or limit condition to 
> limit the data returned. 
> You can also try an ODBC/JDBC client. 
> [Error Id: 7ad61839-a2e8-4fdd-a600-e662fc0f03e0 ]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at org.apache.drill.exec.server.rest.QueryWrapper.run(QueryWrapper.java:115) 
> [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.server.rest.QueryResources.submitQueryJSON(QueryResources.java:74)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at sun.reflect.GeneratedMethodAccessor212.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:205)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
>  [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326) 
> [jersey-server-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:315) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:297) 
> [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.internal.Errors.process(Errors.java:267) 
> [jersey-common-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
>  [jersey-common-2.25.1.jar:na]
>  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) 
> [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
>  [jersey-server-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) 
> [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>  [jersey-container-servlet-core-2.25.1.jar:na]
>  at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> [jetty-servlet-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) 
> [jetty-servlet-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
>  [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) 
> [jetty-servlet-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>  [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) 
> [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at org.eclipse.jetty.server.Server.handle(Server.java:539) 
> [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) 
> [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) 
> [jetty-server-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  [jetty-io-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  [jetty-util-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  [jetty-util-9.3.25.v20180904.jar:9.3.25.v20180904]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]
> 2020-01-07 16:34:02,922 [21eb5269-6632-66b7-40c9-ce652853add2:frag:1:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: 
> Connection pool shut down
> Fragment 1:0
> Please, refer to logs for more information.
> [Error Id: 1f14f398-fab2-40cd-b14c-389d91c002e8 on 10.0.160.5:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Connection pool shut down
> Fragment 1:0
> Please, refer to logs for more information.
> [Error Id: 1f14f398-fab2-40cd-b14c-389d91c002e8 on 10.0.160.5:31010]
>  at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:630)
>  ~[drill-common-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:363)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:329)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.16.0.jar:1.16.0]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [na:1.8.0_222]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_222]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>  at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:167) 
> ~[httpcore-4.2.4.jar:4.2.4]
>  at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:199) 
> ~[httpcore-4.2.4.jar:4.2.4]
>  at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:188)
>  ~[httpclient-4.2.5.jar:4.2.5]
>  at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) ~[na:na]
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.8.0_222]
>  at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
>  at 
> com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:72)
>  ~[aws-java-sdk-1.7.4.jar:na]
>  at com.amazonaws.http.conn.$Proxy76.requestConnection(Unknown Source) 
> ~[na:na]
>  at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:448)
>  ~[httpclient-4.2.5.jar:4.2.5]
>  at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>  ~[httpclient-4.2.5.jar:4.2.5]
>  at 
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>  ~[httpclient-4.2.5.jar:4.2.5]
>  at 
> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) 
> ~[aws-java-sdk-1.7.4.jar:na]
>  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) 
> ~[aws-java-sdk-1.7.4.jar:na]
>  at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) 
> ~[aws-java-sdk-1.7.4.jar:na]
>  at 
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
>  ~[aws-java-sdk-1.7.4.jar:na]
>  at 
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
>  ~[aws-java-sdk-1.7.4.jar:na]
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892) 
> ~[hadoop-aws-2.7.4.jar:na]
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) 
> ~[hadoop-aws-2.7.4.jar:na]
>  at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426) 
> ~[hadoop-common-2.7.4.jar:na]
>  at 
> org.apache.drill.exec.store.StorageStrategy.getNonExistentLocations(StorageStrategy.java:200)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.store.StorageStrategy.createFileAndApply(StorageStrategy.java:134)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.store.text.DrillTextRecordWriter.startNewSchema(DrillTextRecordWriter.java:92)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.store.StringOutputRecordWriter.updateSchema(StringOutputRecordWriter.java:44)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema(WriterRecordBatch.java:160)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:108)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:296)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:283)
>  ~[drill-java-exec-1.16.0.jar:1.16.0]
>  at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_222]
>  at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_222]
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  ~[hadoop-common-2.7.4.jar:na]
>  at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
>  [drill-java-exec-1.16.0.jar:1.16.0]
>  ... 4 common frames omitted
> 2020-01-07 16:34:02,922 [UserServer-1] WARN 
> o.a.drill.exec.rpc.user.UserServer - Message of mode REQUEST of rpc type 3 
> took longer than 500ms. Actual duration was 51169ms.
> 2020-01-07 16:34:28,904 [21eb4fef-3216-f613-4c22-9a88ee20854b:frag:1:6] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 21eb4fef-3216-f613-4c22-9a88ee20854b:1:6: State change requested 
> AWAITING_ALLOCATION --> FAILED
> 2020-01-07 16:34:28,904 [21eb4fef-3216-f613-4c22-9a88ee20854b:frag:1:2] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 21eb4fef-3216-f613-4c22-9a88ee20854b:1:2: State change requested 
> AWAITING_ALLOCATION --> FAILED
> 2020-01-07 16:34:28,904 [21eb4fef-3216-f613-4c22-9a88ee20854b:frag:1:6] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 21eb4fef-3216-f613-4c22-9a88ee20854b:1:6: State change requested FAILED --> 
> FINISHED
> 2020-01-07 16:34:28,904 [21eb4fef-3216-f613-4c22-9a88ee20854b:frag:1:10] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 21eb4fef-3216-f613-4c22-9a88ee20854b:1:10: State change requested 
> AWAITING_ALLOCATION --> FAILED
> 2020-01-07 16:34:28,904 [21eb4fef-3216-f613-4c22-9a88ee20854b:frag:1:2] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 21eb4fef-3216-f613-4c22-9a88ee20854b:1:2: State change requested FAILED --> 
> FINISHED
> 2020-01-07 16:34:28,904 [21eb4fef-3216-f613-4c22-9a88ee20854b:frag:1:10] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 21eb4fef-3216-f613-4c22-9a88ee20854b:1:10: State change requested FAILED --> 
> FINISHED
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to