Re: HBase Client Concerns

stack Wed, 16 Sep 2009 11:23:06 -0700

On Wed, Sep 16, 2009 at 11:16 AM, Jeyendran Balakrishnan <
[email protected]> wrote:


> Thanks a lot for the explanation!
>
> From your dessciption, the same HTable shared across all request threads
> [in an app server application] is a no-no, and instantiating a new
> HTable for each request is slow [depending upon application requirements
> of course]
> ==> HTablePool is a logical solution. Sounds like the situation for
> database connections in an app server.
>
>
I think the 'slow' was the one-time startup cost so don't rule out the
HTable per Thread.




> Assuming the issue you mentioned [ie., slow HTablePool startup when
> there are a lot of client threads hitting a big table] is resolved, is
> it safe to say that best practice for HTable access from an app server
> is to use HTablePool?
>


Others may have stronger opinions on this than I.  HTablePool will not ride
over a restart of the servers according to recent issue filed by Barney
Frank (Whereas HTable per Thread will).



>
> One other question:
> Is it safe to cache a single instance of HBaseConfiguration, and then
> pass it to instantiate a new HTable for each client/request thread? Will
> this improve HTable instantiation time?
>


Yes because you'll be using same HCM and thus same cache of region addresses
across all instances.

St.Ack



>
> Thanks,
> jp
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> stack
> Sent: Wednesday, September 16, 2009 10:16 AM
> To: [email protected]
> Subject: Re: HBase Client Concerns
>
> HTable is not thread-safe (See javadoc class comment).  Use an HTable
> per
> Thread or use HTablePool.  Internally, it carries a write buffer to
> which
> access is not synchronized (doesn't make sense if you look at the code).
>
> Also, you might be interested, HTable in its guts has a static map that
> is
> keyed by HBaseConfiguration instances.  The values are instances of
> HConnectionManager.  HCM wraps our (Hadoop's) rpc client code.   The rpc
> code maintatins a single Connection per remote server.  Requests and
> responses are  multiplexed over this single Connection.  HCM adds
> caching of
> region locations so its good sharing HCMs amongst HTables; i.e. passing
> in
> the same HBaseConnection instance.
>
> We're seeing an issue where if the table is big, if hundreds of client
> threads either carrying their own HTable instance or using a pool where
> startup can take a long time until the cache is filled with sufficient
> region locations.  Its being investigated....
>
> St.Ack
>
>
> On Wed, Sep 16, 2009 at 9:05 AM, Jeyendran Balakrishnan <
> [email protected]> wrote:
>
> > A follow-up question, related to Barney Frank's comment that:
> > "tests with 0.19 that instantiating Htable and HBaseConfuguration()
> had
> > significant overhead i.e. >25ms."
> > What are the implications of creating HTable just once for a given
> table
> > at the start of the application/app-server, and using the reference to
> > that  instantiated HTable for the duration of the app?
> >
> > Thanks,
> > jp
> >
> >
> > -----Original Message-----
> > From: Barney Frank [mailto:[email protected]]
> > Sent: Tuesday, September 15, 2009 4:41 PM
> > To: [email protected]
> > Subject: Re: HBase Client Concerns
> >
> > My app will be "highly threaded" some day.  I was trying to avoid
> > creating
> > another thread for Hbase and use the pool instead.  About 33% of the
> > requests handled in the app server will need to retrieve data from
> > Hbase.  I
> > was hoping to leverage the HTablePool rather than managing my own pool
> > or
> > creating another process that requires a thread. It seemed on my
> earlier
> > tests with 0.19 that instantiating Htable and HBaseConfuguration() had
> > significant overhead i.e. >25ms.
> >
> >
> > I will file an issue.
> >
> > Thanks.
> >
> >
> > On Tue, Sep 15, 2009 at 5:52 PM, stack <[email protected]> wrote:
> >
> > > On Tue, Sep 15, 2009 at 3:13 PM, Barney Frank
> <[email protected]
> > > >wrote:
> > > ....
> > >
> > >
> > > > **** This is despite the fact that I set hbase.pause to be 25 ms
> and
> > the
> > > > retries.number = 2.  ****
> > > >
> > > >
> > > Yeah, this is down in guts of the hadoop rpc we use.  Around
> > connection
> > > setup it has its own config. that is not well aligned with ours
> (ours
> > being
> > > the retries and pause settings)
> > >
> > > The maxretriies down in ipc is
> > >
> > > this.maxRetries = conf.getInt("ipc.client.connect.max.retries", 10);
> > >
> > > Thats for an IOE other than timeout.  For timeout, it does this:
> > >
> > >          } catch (SocketTimeoutException toe) {
> > >            /* The max number of retries is 45,
> > >             * which amounts to 20s*45 = 15 minutes retries.
> > >             */
> > >            handleConnectionFailure(timeoutFailures++, 45, toe);
> > >
> > > Let me file an issue to address the above.  The retries should be
> our
> > > retries... and in here it has a hardcoded 1000ms that instead should
> > be our
> > > pause.... Not hard to fix.
> > >
> > >
> > >
> > > > I restart the Master and RegionServer and then send more client
> > requests
> > > > through HTablePool.  It has the same "Retrying to connect to
> > server:"
> > > > messages.  I noticed that the port number it is using is the old
> > port for
> > > > the region server and not the new one assigned after the restart.
> > The
> > > > HbaseClient does not seem to recover unless I restart the client
> > app.
> > >  When
> > > > I do not use HTablePool and only Htable it works fine.
> > > >
> > >
> > >
> > > We've not done work to make the pool ride over a restart.
> > >
> > >
> > >
> > > > Two issues:
> > > > 1) Setting and using hbase.client.pause and
> > hbase.client.retries.number
> > > > parameters.  I have rarely gotten them to work.  It seems to
> default
> > to 2
> > > > sec and 10 retries no matter if I overwrite the defaults on the
> > client
> > > and
> > > > the server.  Yes, I made sure my client doesn't have anything in
> the
> > > > classpath it might pick-up.
> > > > <property>
> > > > <name>hbase.client.pause</name>
> > > > <value>20</value>
> > > > </property>
> > > > <property>
> > > > <name>hbase.client.retries.number</name>
> > > > <value>2</value>
> > > > </property>
> > > >
> > >
> > >
> > > Please make an issue for this and I'll investigate.  I"ve already
> > added
> > > note
> > > to an existing HBaseClient ipc issue and will fix above items as
> part
> > of
> > > it.
> > >
> > >
> > >
> > > > 2) Running HTablePool under Pseudo mode, the client doesn't seem
> to
> > > refresh
> > > > with the new regionserver port after the master/regions are back
> up.
> > It
> > > > gets "stuck" with the info from the settings prior to the master
> > goin
> > > down.
> > > >
> > > > I would appreciate any thoughts or help.
> > > >
> > >
> > >
> > > You need to use the pool?  Your app is highly threaded and all are
> > > connecting to hbase (hundreds)?
> > >
> > > St.Ack
> > >
> >
>

Re: HBase Client Concerns

Reply via email to