[
https://issues.apache.org/jira/browse/HBASE-19201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243810#comment-16243810
]
Lucas Resch commented on HBASE-19201:
-------------------------------------
Today I fixed the problem a little more cleanly by extending the HBaseContext
and JavaHBaseContext. I added calls to bulkLoad and bulkLoadThinRows to the
JavaHBaseContext and a conn.close() to both those functions within HBaseContext.
{code:scala}
class ExtendedJavaHBaseContext(
@transient jsc: JavaSparkContext,
@transient config: Configuration) extends JavaHBaseContext(jsc, config) {
override val hbaseContext = new ExtendedHBaseContext(jsc.sc, config)
...
def hbaseBulkLoad[T](...)
def hbaseBulkLoadThinRows[T](...)
}
{code}
{code:scala}
class ExtendedHBaseContext(
sc: SparkContext,
config: Configuration,
tmpHdfsConfgFile: String = null) extends HBaseContext(sc, config,
tmpHdfsConfgFile) {
override def bulkLoad[T](...): Unit = {
val conn = ConnectionFactory.createConnection(config)
...
conn.close()
}
override def bulkLoadThinRows[T](...): Unit = {
val conn = ConnectionFactory.createConnection(config)
...
conn.close()
}
{code}
I also had another look at newer branches: The first CDH branch that might fix
this problem is cdh5-1.2.0_5.13.0 as it uses an HBaseConnectionCache instead of
creating a connection with the ConnectionFactory. I'm not sure if that prevents
a connection leak but it might.
> BulkLoading in HBaseContext in hbase-spark does not close connection
> --------------------------------------------------------------------
>
> Key: HBASE-19201
> URL: https://issues.apache.org/jira/browse/HBASE-19201
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 1.1.12
> Environment: I was using the cdh 5.11.1 version but I checken on
> newest branch and problem persists
> Reporter: Lucas Resch
> Labels: newbie
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> Within the hbase-spark module an HBaseContext exists that provides utility
> functions to do bulkLoading data in HBase. I tried using this function in a
> streaming context, but after a while Zookeeper denies further connections
> since the maximum of connections per client is exhausted.
> This issue seems to be within HBaseContext, since the functions bulkLoad and
> bulkLoadThinRows open a connection via the ConnectionFactory, but never
> closes that connection.
> I copied the needed code into a new scala project and added a conn.close() at
> the end of the function and the problem is gone.
> It seems like no one else has had this problem before. I'm guessing thats
> because almost no one uses its function within a streaming context. And a one
> time call to it with RDDs might never reach that upper limit on connections.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)