On Wed, Sep 16, 2009 at 11:16 AM, Jeyendran Balakrishnan < [email protected]> wrote:
> Thanks a lot for the explanation! > > From your dessciption, the same HTable shared across all request threads > [in an app server application] is a no-no, and instantiating a new > HTable for each request is slow [depending upon application requirements > of course] > ==> HTablePool is a logical solution. Sounds like the situation for > database connections in an app server. > > I think the 'slow' was the one-time startup cost so don't rule out the HTable per Thread. > Assuming the issue you mentioned [ie., slow HTablePool startup when > there are a lot of client threads hitting a big table] is resolved, is > it safe to say that best practice for HTable access from an app server > is to use HTablePool? > Others may have stronger opinions on this than I. HTablePool will not ride over a restart of the servers according to recent issue filed by Barney Frank (Whereas HTable per Thread will). > > One other question: > Is it safe to cache a single instance of HBaseConfiguration, and then > pass it to instantiate a new HTable for each client/request thread? Will > this improve HTable instantiation time? > Yes because you'll be using same HCM and thus same cache of region addresses across all instances. St.Ack > > Thanks, > jp > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > stack > Sent: Wednesday, September 16, 2009 10:16 AM > To: [email protected] > Subject: Re: HBase Client Concerns > > HTable is not thread-safe (See javadoc class comment). Use an HTable > per > Thread or use HTablePool. Internally, it carries a write buffer to > which > access is not synchronized (doesn't make sense if you look at the code). > > Also, you might be interested, HTable in its guts has a static map that > is > keyed by HBaseConfiguration instances. The values are instances of > HConnectionManager. HCM wraps our (Hadoop's) rpc client code. The rpc > code maintatins a single Connection per remote server. Requests and > responses are multiplexed over this single Connection. HCM adds > caching of > region locations so its good sharing HCMs amongst HTables; i.e. passing > in > the same HBaseConnection instance. > > We're seeing an issue where if the table is big, if hundreds of client > threads either carrying their own HTable instance or using a pool where > startup can take a long time until the cache is filled with sufficient > region locations. Its being investigated.... > > St.Ack > > > On Wed, Sep 16, 2009 at 9:05 AM, Jeyendran Balakrishnan < > [email protected]> wrote: > > > A follow-up question, related to Barney Frank's comment that: > > "tests with 0.19 that instantiating Htable and HBaseConfuguration() > had > > significant overhead i.e. >25ms." > > What are the implications of creating HTable just once for a given > table > > at the start of the application/app-server, and using the reference to > > that instantiated HTable for the duration of the app? > > > > Thanks, > > jp > > > > > > -----Original Message----- > > From: Barney Frank [mailto:[email protected]] > > Sent: Tuesday, September 15, 2009 4:41 PM > > To: [email protected] > > Subject: Re: HBase Client Concerns > > > > My app will be "highly threaded" some day. I was trying to avoid > > creating > > another thread for Hbase and use the pool instead. About 33% of the > > requests handled in the app server will need to retrieve data from > > Hbase. I > > was hoping to leverage the HTablePool rather than managing my own pool > > or > > creating another process that requires a thread. It seemed on my > earlier > > tests with 0.19 that instantiating Htable and HBaseConfuguration() had > > significant overhead i.e. >25ms. > > > > > > I will file an issue. > > > > Thanks. > > > > > > On Tue, Sep 15, 2009 at 5:52 PM, stack <[email protected]> wrote: > > > > > On Tue, Sep 15, 2009 at 3:13 PM, Barney Frank > <[email protected] > > > >wrote: > > > .... > > > > > > > > > > **** This is despite the fact that I set hbase.pause to be 25 ms > and > > the > > > > retries.number = 2. **** > > > > > > > > > > > Yeah, this is down in guts of the hadoop rpc we use. Around > > connection > > > setup it has its own config. that is not well aligned with ours > (ours > > being > > > the retries and pause settings) > > > > > > The maxretriies down in ipc is > > > > > > this.maxRetries = conf.getInt("ipc.client.connect.max.retries", 10); > > > > > > Thats for an IOE other than timeout. For timeout, it does this: > > > > > > } catch (SocketTimeoutException toe) { > > > /* The max number of retries is 45, > > > * which amounts to 20s*45 = 15 minutes retries. > > > */ > > > handleConnectionFailure(timeoutFailures++, 45, toe); > > > > > > Let me file an issue to address the above. The retries should be > our > > > retries... and in here it has a hardcoded 1000ms that instead should > > be our > > > pause.... Not hard to fix. > > > > > > > > > > > > > I restart the Master and RegionServer and then send more client > > requests > > > > through HTablePool. It has the same "Retrying to connect to > > server:" > > > > messages. I noticed that the port number it is using is the old > > port for > > > > the region server and not the new one assigned after the restart. > > The > > > > HbaseClient does not seem to recover unless I restart the client > > app. > > > When > > > > I do not use HTablePool and only Htable it works fine. > > > > > > > > > > > > > We've not done work to make the pool ride over a restart. > > > > > > > > > > > > > Two issues: > > > > 1) Setting and using hbase.client.pause and > > hbase.client.retries.number > > > > parameters. I have rarely gotten them to work. It seems to > default > > to 2 > > > > sec and 10 retries no matter if I overwrite the defaults on the > > client > > > and > > > > the server. Yes, I made sure my client doesn't have anything in > the > > > > classpath it might pick-up. > > > > <property> > > > > <name>hbase.client.pause</name> > > > > <value>20</value> > > > > </property> > > > > <property> > > > > <name>hbase.client.retries.number</name> > > > > <value>2</value> > > > > </property> > > > > > > > > > > > > > Please make an issue for this and I'll investigate. I"ve already > > added > > > note > > > to an existing HBaseClient ipc issue and will fix above items as > part > > of > > > it. > > > > > > > > > > > > > 2) Running HTablePool under Pseudo mode, the client doesn't seem > to > > > refresh > > > > with the new regionserver port after the master/regions are back > up. > > It > > > > gets "stuck" with the info from the settings prior to the master > > goin > > > down. > > > > > > > > I would appreciate any thoughts or help. > > > > > > > > > > > > > You need to use the pool? Your app is highly threaded and all are > > > connecting to hbase (hundreds)? > > > > > > St.Ack > > > > > >
