[jira] [Commented] (HBASE-19201) BulkLoading in HBaseContext in hbase-spark does not close connection

Lucas Resch (JIRA) Wed, 08 Nov 2017 04:35:25 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243810#comment-16243810
 ]


Lucas Resch commented on HBASE-19201:
-------------------------------------

Today I fixed the problem a little more cleanly by extending the HBaseContext 
and JavaHBaseContext. I added calls to bulkLoad and bulkLoadThinRows to the 
JavaHBaseContext and a conn.close() to both those functions within HBaseContext.

{code:scala}
class ExtendedJavaHBaseContext(
  @transient jsc: JavaSparkContext,
  @transient config: Configuration) extends JavaHBaseContext(jsc, config) {

  override val hbaseContext = new ExtendedHBaseContext(jsc.sc, config)
  ...
  def hbaseBulkLoad[T](...)
  def hbaseBulkLoadThinRows[T](...)

}
{code}


{code:scala}
class ExtendedHBaseContext(
  sc: SparkContext,
  config: Configuration,
  tmpHdfsConfgFile: String = null) extends HBaseContext(sc, config, 
tmpHdfsConfgFile) {

  override def bulkLoad[T](...): Unit = {
    val conn = ConnectionFactory.createConnection(config)
    ...
    conn.close()
  }

  override def bulkLoadThinRows[T](...): Unit = {
    val conn = ConnectionFactory.createConnection(config)
    ...
    conn.close()
  }
{code}

I also had another look at newer branches: The first CDH branch that might fix 
this problem is cdh5-1.2.0_5.13.0 as it uses an HBaseConnectionCache instead of 
creating a connection with the ConnectionFactory. I'm not sure if that prevents 
a connection leak but it might.

> BulkLoading in HBaseContext in hbase-spark does not close connection
> --------------------------------------------------------------------
>
>                 Key: HBASE-19201
>                 URL: https://issues.apache.org/jira/browse/HBASE-19201
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 1.1.12
>         Environment: I was using the cdh 5.11.1 version but I checken on 
> newest branch and problem persists
>            Reporter: Lucas Resch
>              Labels: newbie
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Within the hbase-spark module an HBaseContext exists that provides utility 
> functions to do bulkLoading data in HBase. I tried using this function in a 
> streaming context, but after a while Zookeeper denies further connections 
> since the maximum of connections per client is exhausted. 
> This issue seems to be within HBaseContext, since the functions bulkLoad and 
> bulkLoadThinRows open a connection via the ConnectionFactory, but never 
> closes that connection.
> I copied the needed code into a new scala project and added a conn.close() at 
> the end of the function and the problem is gone. 
> It seems like no one else has had this problem before. I'm guessing thats 
> because almost no one uses its function within a streaming context. And a one 
> time call to it with RDDs might never reach that upper limit on connections. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-19201) BulkLoading in HBaseContext in hbase-spark does not close connection

Reply via email to