[ 
https://issues.apache.org/jira/browse/DRILL-6935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ken updated DRILL-6935:
-----------------------
    Description: 
Hi Team, 

Hope all is good.

We need your help. 

Here is the apache drill process which we installed in our server.

drill 19220 1 17 16:48 ? 00:15:32 /usr/java/jdk/bin/java > -Xms8G -Xmx8G 
-XX:MaxDirectMemorySize=96G -XX:ReservedCodeCacheSize=1024m > 
-Ddrill.exec.enable-epoll=false -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC > 
-Dlog.path=/var/log/drill/drillbit.log > 
-Dlog.query.path=/var/log/drill/drillbit_queries.json -cp > 
/usr/local/apache-drill-1.13.1/conf:/usr/local/apache-drill-1.13.1/jars/*:/usr/local/apache-drill-1.13.1/jars/ext/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/*:/usr/local/apache-drill-1.13.1/jars/classb/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/linux/*
 > org.apache.drill.exec.server.Drillbit > root 23651 23227 0 18:16 pts/1 
00:00:00 grep --color=auto java 

Question 1:

There are a lot of CLOSE_WAIT states when I access apache drill 
[https://ip|https://ip/] > address:8047 
<[https://theremin.digitalalchemy.net.au:8047/]> I have changed > our server ip 
to xxxx for the secruity reason, this caused that we can't > access apache 
drill by [https://ip|https://ip/] address:8047 > 
<[https://theremin.digitalalchemy.net.au:8047/]>, so we can't check which SQL  
run failed.

 tcp6 0 0 :::8047 :::* LISTEN > 19220/java > tcp6 518 0 192.168.xxxx:8047 
192.168.100.131:54132 > CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 
192.168.100.222:52986

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53009 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54131

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:61202 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54366 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54129 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58627 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58486 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54134 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53008 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:56226 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52991 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:51172 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:36136 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54133 

CLOSE_WAIT 19220/java > tcp6 24 0 192.168. xxxx :8047 192.168.100.131:57474 

ESTABLISHED 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54069  
CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54130 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53001 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52985 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52990 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54212 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.100.131:58628 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.100.131:53955 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:57391 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:41219 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54307 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53000 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168 xxxx :8047 192.168.100.222:52984 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54308 

CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:46189 

CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54211 

CLOSE_WAIT 19220/java > > > >

Question 2:

Our apache drill was down frequently, it seems that it is due to memory  leak. 
However, we have configured 96G memory for apache dirll, so can you please 
advise how can we identify which SQL took a lot of memory? and how  can improve 
our performance?  Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on  
server:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (67043328) 
> Allocator(op:14:0:0:HashPartitionSender) > 
1000000/67043328/101535744/10000000000 (res/actual/peak/limit) > > > Fragment 
14:0 > > [Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on > 
theremin.root.digitalalchemy:31010] > at > 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 > ~[drill-common-1.13.0.jar:1.13.0] > at > 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300)
 > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
 > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
> [drill-common-1.13.0.jar:1.13.0] > at > 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
> [na:1.8.0_161] > at > 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
> [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > > 
) > > Thank you. >

  was:
Hi Team, 

Hope all is good.

We need your help. 

Here is the apache drill process which we installed in our server.

drill 19220 1 17 16:48 ? 00:15:32 /usr/java/jdk/bin/java > -Xms8G -Xmx8G 
-XX:MaxDirectMemorySize=96G -XX:ReservedCodeCacheSize=1024m > 
-Ddrill.exec.enable-epoll=false -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC > 
-Dlog.path=/var/log/drill/drillbit.log > 
-Dlog.query.path=/var/log/drill/drillbit_queries.json -cp > 
/usr/local/apache-drill-1.13.1/conf:/usr/local/apache-drill-1.13.1/jars/*:/usr/local/apache-drill-1.13.1/jars/ext/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/*:/usr/local/apache-drill-1.13.1/jars/classb/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/linux/*
 > org.apache.drill.exec.server.Drillbit > root 23651 23227 0 18:16 pts/1 
00:00:00 grep --color=auto java 

Question 1:

There are a lot of CLOSE_WAIT states when I access apache drill 
[https://ip|https://ip/] > address:8047 
<[https://theremin.digitalalchemy.net.au:8047/]> I have changed > our server ip 
to xxxx for the secruity reason, this caused that we can't > access apache 
drill by [https://ip|https://ip/] address:8047 > 
<[https://theremin.digitalalchemy.net.au:8047/]>, so we can't check which SQL > 
run failed. > > tcp6 0 0 :::8047 :::* LISTEN > 19220/java > tcp6 518 0 
192.168.xxxx:8047 192.168.100.131:54132 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.100.222:52986 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:53009 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54131 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.3.119:61202 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54366 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54129 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:58627 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:58486 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54134 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:53008 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.3.119:56226 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:52991 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.3.119:51172 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.3.119:36136 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54133 > CLOSE_WAIT 19220/java > tcp6 24 0 
192.168. xxxx :8047 192.168.100.131:57474 > ESTABLISHED 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54069 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54130 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:53001 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:52985 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:52990 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54212 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.100.131:58628 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.100.131:53955 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:57391 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.3.119:41219 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54307 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.222:53000 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168 xxxx :8047 192.168.100.222:52984 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54308 > CLOSE_WAIT 19220/java > tcp6 1 0 
192.168. xxxx :8047 192.168.3.119:46189 > CLOSE_WAIT 19220/java > tcp6 518 0 
192.168. xxxx :8047 192.168.100.131:54211 > CLOSE_WAIT 19220/java > > > >

Question 2:

Our apache drill was down frequently, it seems that it is due to memory  leak. 
However, we have configured 96G memory for apache dirll, so can you please 
advise how can we identify which SQL took a lot of memory? and how  can improve 
our performance?  Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on  
server:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Memory was leaked by query. Memory leaked: (67043328) 
> Allocator(op:14:0:0:HashPartitionSender) > 
1000000/67043328/101535744/10000000000 (res/actual/peak/limit) > > > Fragment 
14:0 > > [Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on > 
theremin.root.digitalalchemy:31010] > at > 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 > ~[drill-common-1.13.0.jar:1.13.0] > at > 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300)
 > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
 > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
> [drill-common-1.13.0.jar:1.13.0] > at > 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
> [na:1.8.0_161] > at > 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
> [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > > 
) > > Thank you. >


> Apache drill show a lot of CLOSE_WAIT states when we access https://ip 
> address:8047 caused the URL is not available
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6935
>                 URL: https://issues.apache.org/jira/browse/DRILL-6935
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.13.0
>            Reporter: ken
>            Priority: Critical
>
> Hi Team, 
> Hope all is good.
> We need your help. 
> Here is the apache drill process which we installed in our server.
> drill 19220 1 17 16:48 ? 00:15:32 /usr/java/jdk/bin/java > -Xms8G -Xmx8G 
> -XX:MaxDirectMemorySize=96G -XX:ReservedCodeCacheSize=1024m > 
> -Ddrill.exec.enable-epoll=false -XX:+CMSClassUnloadingEnabled -XX:+UseG1GC > 
> -Dlog.path=/var/log/drill/drillbit.log > 
> -Dlog.query.path=/var/log/drill/drillbit_queries.json -cp > 
> /usr/local/apache-drill-1.13.1/conf:/usr/local/apache-drill-1.13.1/jars/*:/usr/local/apache-drill-1.13.1/jars/ext/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/*:/usr/local/apache-drill-1.13.1/jars/classb/*:/usr/local/apache-drill-1.13.1/jars/3rdparty/linux/*
>  > org.apache.drill.exec.server.Drillbit > root 23651 23227 0 18:16 pts/1 
> 00:00:00 grep --color=auto java 
> Question 1:
> There are a lot of CLOSE_WAIT states when I access apache drill 
> [https://ip|https://ip/] > address:8047 
> <[https://theremin.digitalalchemy.net.au:8047/]> I have changed > our server 
> ip to xxxx for the secruity reason, this caused that we can't > access apache 
> drill by [https://ip|https://ip/] address:8047 > 
> <[https://theremin.digitalalchemy.net.au:8047/]>, so we can't check which SQL 
>  run failed.
>  tcp6 0 0 :::8047 :::* LISTEN > 19220/java > tcp6 518 0 192.168.xxxx:8047 
> 192.168.100.131:54132 > CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 
> 192.168.100.222:52986
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53009 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54131
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:61202 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54366 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54129 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58627 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:58486 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54134 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53008 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:56226 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52991 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:51172 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:36136 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54133 
> CLOSE_WAIT 19220/java > tcp6 24 0 192.168. xxxx :8047 192.168.100.131:57474 
> ESTABLISHED 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54069 
>  CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54130 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53001 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52985 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:52990 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54212 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.100.131:58628 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.100.131:53955 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:57391 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:41219 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54307 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.222:53000 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168 xxxx :8047 192.168.100.222:52984 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54308 
> CLOSE_WAIT 19220/java > tcp6 1 0 192.168. xxxx :8047 192.168.3.119:46189 
> CLOSE_WAIT 19220/java > tcp6 518 0 192.168. xxxx :8047 192.168.100.131:54211 
> CLOSE_WAIT 19220/java > > > >
> Question 2:
> Our apache drill was down frequently, it seems that it is due to memory  
> leak. However, we have configured 96G memory for apache dirll, so can you 
> please advise how can we identify which SQL took a lot of memory? and how  
> can improve our performance?  Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 
> on  server:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM 
> ERROR: > IllegalStateException: Memory was leaked by query. Memory leaked: 
> (67043328) > Allocator(op:14:0:0:HashPartitionSender) > 
> 1000000/67043328/101535744/10000000000 (res/actual/peak/limit) > > > Fragment 
> 14:0 > > [Error Id: 40d789a6-91ee-4e0b-bfc9-a26358a43df3 on > 
> theremin.root.digitalalchemy:31010] > at > 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
>  > ~[drill-common-1.13.0.jar:1.13.0] > at > 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:300)
>  > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
>  > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:266)
>  > [drill-java-exec-1.13.0.jar:1.13.0] > at > 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  > [drill-common-1.13.0.jar:1.13.0] > at > 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  > [na:1.8.0_161] > at > 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  > [na:1.8.0_161] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] > 
> > ) > > Thank you. >



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to