[ https://issues.apache.org/jira/browse/PHOENIX-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Elser resolved PHOENIX-4503. --------------------------------- Resolution: Duplicate Marking as a duplicate of PHOENIX-4489 > Phoenix-Spark plugin doesn't release zookeeper connections > ---------------------------------------------------------- > > Key: PHOENIX-4503 > URL: https://issues.apache.org/jira/browse/PHOENIX-4503 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.11.0 > Environment: HBase 1.2 on Linux (Ubuntu, CentOS) > Reporter: Suhas Nalapure > Priority: Major > > *1. Phoenix-Spark plugin doesn't release zookeeper connections* > Example: > > {code:java} > for(int i=0; i < 50; i++){ > Dataset<Row> df = > sqlContext.read().format("org.apache.phoenix.spark") > .option("table", > "\"Sales\"").option("zkUrl", "localhost:2181") > .load(); > df.show(2); > } > Thread.sleep(1000*60); > {code} > > When the above snippet is executed, we can see number of connections to 2181 > increasing and not getting released until after the main thread wakes up from > sleep and program ends as can be seen below (14 is the number of connections > even before the program starts to run) : > netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 16:52:05 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 22 > 16:52:15 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 38 > 16:52:18 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 68 > 16:52:23 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 100 > 16:52:27 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:52:32 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:52:38 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:52:52 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:53:00 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 116 > 16:53:24 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 16:53:32 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 16:53:34 > root@user1 ~ $ > *2. Instead if "jdbc" format is used to create Spark Dataframe, the > connection count doesn't shoot up* > Example: > > {code:java} > for(int i=0; i < 50; i++){ > Dataset<Row> df = sqlContext.read().format("jdbc") > .option("url", > "jdbc:phoenix:localhost:2181") > .option("dbtable", "\"Sales\"") > .option("driver", > "org.apache.phoenix.jdbc.PhoenixDriver") > .load(); > df.show(2); > } > Thread.sleep(1000*60); > {code} > > Connection counts during program execution(14 being the count before > execution starts): > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 17:00:42 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 17:00:43 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:00:46 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:00:50 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:00:55 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:12 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:18 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:28 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:34 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:37 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 16 > 17:01:39 > root@user1 ~ $ netstat -anp | grep 2181|grep EST| wc -l; date +"%H:%M:%S" > 14 > 17:02:07 -- This message was sent by Atlassian JIRA (v7.6.3#76005)