[ 
https://issues.apache.org/jira/browse/PHOENIX-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315655#comment-16315655
 ] 

Karan Mehta commented on PHOENIX-4489:
--------------------------------------

[~vincentpoon] 
Technically, as we discussed it shouldn't be a problem since we go out of scope 
real quick after the generateSplits() method is executed and the connection 
object should be garbage collected. However, if you checkout PHOENIX-4503, the 
client is trying to read multiple spark dataframes inside a loop (almost 50 
times). Such a code will get executed fast and will result in lots of 
HConnections and ZKConnections getting created in a short span of time and I 
suspect that even though GC gets triggered to clear them, it might actually 
take some time before this to happen (until JVM feels the need). This can cause 
issues with the application. I see many issues filed in this regard. 

Also, since the connections are not instantiated via factory, it is difficult 
to catch their quantity and limit the resources by having a custom 
implementation. What do you think?

FYI, [~aertoria]

> HBase Connection leak in Phoenix MR Jobs
> ----------------------------------------
>
>                 Key: PHOENIX-4489
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4489
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: PHOENIX-4489.001.patch
>
>
> Phoenix MR jobs uses a custom class {{PhoenixInputFormat}} to determine the 
> splits and the parallelism of the work. The class directly opens up a HBase 
> connection, which is not closed after the usage. Independently running MR 
> jobs should not have any concern, however jobs that run through Phoenix-Spark 
> can cause leak issues if this is left unclosed (since those jobs run as a 
> part of same JVM). 
> Apart from this, the connection should be instantiated with 
> {{HBaseFactoryProvider.getHConnectionFactory()}} instead of the default one. 
> It can be useful if a separate client is trying to run jobs and wants to 
> provide a custom implementation of {{HConnection}}. 
> [~jmahonin] Any ideas?
> [~jamestaylor] [~vincentpoon] Any concerns around this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to