Thanks a lot for the explanation!

>From your dessciption, the same HTable shared across all request threads
[in an app server application] is a no-no, and instantiating a new
HTable for each request is slow [depending upon application requirements
of course] 
==> HTablePool is a logical solution. Sounds like the situation for
database connections in an app server.

Assuming the issue you mentioned [ie., slow HTablePool startup when
there are a lot of client threads hitting a big table] is resolved, is
it safe to say that best practice for HTable access from an app server
is to use HTablePool?

One other question:
Is it safe to cache a single instance of HBaseConfiguration, and then
pass it to instantiate a new HTable for each client/request thread? Will
this improve HTable instantiation time?

Thanks,
jp


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
stack
Sent: Wednesday, September 16, 2009 10:16 AM
To: [email protected]
Subject: Re: HBase Client Concerns

HTable is not thread-safe (See javadoc class comment).  Use an HTable
per
Thread or use HTablePool.  Internally, it carries a write buffer to
which
access is not synchronized (doesn't make sense if you look at the code).

Also, you might be interested, HTable in its guts has a static map that
is
keyed by HBaseConfiguration instances.  The values are instances of
HConnectionManager.  HCM wraps our (Hadoop's) rpc client code.   The rpc
code maintatins a single Connection per remote server.  Requests and
responses are  multiplexed over this single Connection.  HCM adds
caching of
region locations so its good sharing HCMs amongst HTables; i.e. passing
in
the same HBaseConnection instance.

We're seeing an issue where if the table is big, if hundreds of client
threads either carrying their own HTable instance or using a pool where
startup can take a long time until the cache is filled with sufficient
region locations.  Its being investigated....

St.Ack


On Wed, Sep 16, 2009 at 9:05 AM, Jeyendran Balakrishnan <
[email protected]> wrote:

> A follow-up question, related to Barney Frank's comment that:
> "tests with 0.19 that instantiating Htable and HBaseConfuguration()
had
> significant overhead i.e. >25ms."
> What are the implications of creating HTable just once for a given
table
> at the start of the application/app-server, and using the reference to
> that  instantiated HTable for the duration of the app?
>
> Thanks,
> jp
>
>
> -----Original Message-----
> From: Barney Frank [mailto:[email protected]]
> Sent: Tuesday, September 15, 2009 4:41 PM
> To: [email protected]
> Subject: Re: HBase Client Concerns
>
> My app will be "highly threaded" some day.  I was trying to avoid
> creating
> another thread for Hbase and use the pool instead.  About 33% of the
> requests handled in the app server will need to retrieve data from
> Hbase.  I
> was hoping to leverage the HTablePool rather than managing my own pool
> or
> creating another process that requires a thread. It seemed on my
earlier
> tests with 0.19 that instantiating Htable and HBaseConfuguration() had
> significant overhead i.e. >25ms.
>
>
> I will file an issue.
>
> Thanks.
>
>
> On Tue, Sep 15, 2009 at 5:52 PM, stack <[email protected]> wrote:
>
> > On Tue, Sep 15, 2009 at 3:13 PM, Barney Frank
<[email protected]
> > >wrote:
> > ....
> >
> >
> > > **** This is despite the fact that I set hbase.pause to be 25 ms
and
> the
> > > retries.number = 2.  ****
> > >
> > >
> > Yeah, this is down in guts of the hadoop rpc we use.  Around
> connection
> > setup it has its own config. that is not well aligned with ours
(ours
> being
> > the retries and pause settings)
> >
> > The maxretriies down in ipc is
> >
> > this.maxRetries = conf.getInt("ipc.client.connect.max.retries", 10);
> >
> > Thats for an IOE other than timeout.  For timeout, it does this:
> >
> >          } catch (SocketTimeoutException toe) {
> >            /* The max number of retries is 45,
> >             * which amounts to 20s*45 = 15 minutes retries.
> >             */
> >            handleConnectionFailure(timeoutFailures++, 45, toe);
> >
> > Let me file an issue to address the above.  The retries should be
our
> > retries... and in here it has a hardcoded 1000ms that instead should
> be our
> > pause.... Not hard to fix.
> >
> >
> >
> > > I restart the Master and RegionServer and then send more client
> requests
> > > through HTablePool.  It has the same "Retrying to connect to
> server:"
> > > messages.  I noticed that the port number it is using is the old
> port for
> > > the region server and not the new one assigned after the restart.
> The
> > > HbaseClient does not seem to recover unless I restart the client
> app.
> >  When
> > > I do not use HTablePool and only Htable it works fine.
> > >
> >
> >
> > We've not done work to make the pool ride over a restart.
> >
> >
> >
> > > Two issues:
> > > 1) Setting and using hbase.client.pause and
> hbase.client.retries.number
> > > parameters.  I have rarely gotten them to work.  It seems to
default
> to 2
> > > sec and 10 retries no matter if I overwrite the defaults on the
> client
> > and
> > > the server.  Yes, I made sure my client doesn't have anything in
the
> > > classpath it might pick-up.
> > > <property>
> > > <name>hbase.client.pause</name>
> > > <value>20</value>
> > > </property>
> > > <property>
> > > <name>hbase.client.retries.number</name>
> > > <value>2</value>
> > > </property>
> > >
> >
> >
> > Please make an issue for this and I'll investigate.  I"ve already
> added
> > note
> > to an existing HBaseClient ipc issue and will fix above items as
part
> of
> > it.
> >
> >
> >
> > > 2) Running HTablePool under Pseudo mode, the client doesn't seem
to
> > refresh
> > > with the new regionserver port after the master/regions are back
up.
> It
> > > gets "stuck" with the info from the settings prior to the master
> goin
> > down.
> > >
> > > I would appreciate any thoughts or help.
> > >
> >
> >
> > You need to use the pool?  Your app is highly threaded and all are
> > connecting to hbase (hundreds)?
> >
> > St.Ack
> >
>

Reply via email to