Let's do a little quiz: HTable t1 = new HTable(conf); t1.close();
// 1. Will the next line create a new HConnection behind the scenes (along with re-creating all the caches)? // (If so, it will be expensive, if not, when is the first HConnection actually released?) HTable t2 = new HTable(conf); // 2. how about this one? HTable t2 = new HTable(new Configuration(conf)); // 3. or now? conf.setInt(HConstants.HBASE_CLIENT_PAUSE, 2000); HTable t3 = new HTable(conf); // 4. and now? conf.setInt(HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY, 1024000); HTable t4 = new HTable(conf); // 5. how many connections are opened now? t4.close(); This stuff is convoluted and needlessly complicated. And this is not because the code is bad, but because the abstraction is simply inadequate. A client wants to connect to a cluster and then do some action on that cluster (via HTable as a convenience). If the cluster connection is implicit it leads to all of the above considerations. (#1: Yes, #2: no, #3: yes, #4: no, #5: I don't really know, id'd have run it to see) -- Lars ________________________________ From: Ted Yu <yuzhih...@gmail.com> To: lars hofhansl <la...@apache.org> Cc: "dev@hbase.apache.org" <dev@hbase.apache.org> Sent: Sunday, August 4, 2013 7:39 PM Subject: Re: Heads up, HTablePool will be deprecated in 0.94, 0.95/0.96, and removed in 0.98 In the Connections "managing" HTables case, don't we need to figure out when an HConnection should be released ? On Sun, Aug 4, 2013 at 7:23 PM, lars hofhansl <la...@apache.org> wrote: Just look at HConnectionKey part, and hoops we go through to detect whether HConnections are the same or not, when to cache them, when/how to release them. >In fact almost all HConnectionManager does is managing HConnections on behalf >of HTable, when it should be other way around. > >Typically, when things get hard to explain (check out the comments in >HConnectionManager) there is either an abstraction missing, or the abstraction >is not right. >The reverse (Connections "managing" HTables) has none of this. > > >-- Lars > > >_______________________________ >From: Ted Yu <yuzhih...@gmail.com> >To: dev@hbase.apache.org; lars hofhansl <la...@apache.org> >Sent: Sunday, August 4, 2013 4:27 PM > >Subject: Re: Heads up, HTablePool will be deprecated in 0.94, 0.95/0.96, and >removed in 0.98 > > > >bq. no funny business with unique Configurations > >Mind telling us what is funny about this part ? > > >On Sat, Aug 3, 2013 at 10:41 PM, lars hofhansl <la...@apache.org> wrote: > >Correct. The HConnection is naturally shared between the HTables. >>There is no longer any need to worry about this (no funny business with >>unique Configurations, in fact most of the code in HConnectionManager can be >>removed in trunk). >> >>It is also correct that the code now has to hold on the created HConnection, >>rather asking HConnectionManager for it. >> >>-- Lars >> >> >> >>________________________________ >> From: Nick Dimiduk <ndimi...@gmail.com> >>To: dev@hbase.apache.org >>Sent: Saturday, August 3, 2013 8:56 PM >> >>Subject: Re: Heads up, HTablePool will be deprecated in 0.94, 0.95/0.96, and >>removed in 0.98 >> >> >>On Sat, Aug 3, 2013 at 8:52 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Does this mean that user code wouldn't be able to depend >>> on HConnectionManager for connection sharing ? >>> >> >>My read of the above is that the HConnection instance is shared across >>consumers, is the shared connection. Am I reading that correctly? >> >>On Sat, Aug 3, 2013 at 7:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>> > Ah, I find the JIRA - HBASE-9117. >>> > >>> > Cheers >>> > >>> > >>> > On Fri, Aug 2, 2013 at 10:54 PM, lars hofhansl <la...@apache.org> wrote: >>> > >>> >> Yeah, I filed a separate ticket for the API removal in trunk. >>> >> >>> >> >>> >> >>> >> ________________________________ >>> >> From: Ted Yu <yuzhih...@gmail.com> >>> >> To: dev@hbase.apache.org; lars hofhansl <la...@apache.org> >>> >> Sent: Friday, August 2, 2013 10:31 PM >>> >> Subject: Re: Heads up, HTablePool will be deprecated in 0.94, 0.95/0.96, >>> >> and removed in 0.98 >>> >> >>> >> >>> >> bq. HConnectionManager.getConnection() will be removed. >>> >> >>> >> I don't see the above change in 6580-trunk.txt >>> >> Would the above be done in next patch or in another JIRA ? >>> >> >>> >> Cheers >>> >> >>> >> On Fri, Aug 2, 2013 at 9:29 PM, lars hofhansl <la...@apache.org> wrote: >>> >> >>> >> > See. https://issues.apache.org/jira/browse/HBASE-6580 >>> >> > >>> >> > The new proposed API looks like this: >>> >> > >>> >> > Here's the proposed new API: >>> >> > * HConnectionManager: >>> >> > public static HConnection createConnection(Configuration conf) >>> >> > public static HConnection createConnection(Configuration conf, >>> >> > ExecutorService pool) >>> >> > >>> >> > * HConnection: >>> >> > public HTableInterface getTable(byte[] tableName) throws >>> IOException >>> >> > public HTableInterface getTable(byte[] tableName, ExecutorService >>> >> > pool) throws IOException >>> >> > public HTableInterface getTable(String tableName) throws >>> IOException >>> >> > >>> >> > By default HConnectionImplementation will create an ExecutorService >>> when >>> >> > needed. The ExecutorService can optionally passed be passed in. >>> >> > HTableInterfaces are retrieved from the HConnection. By default the >>> >> > HConnection's ExecutorService is used, but optionally that can be >>> >> > overridden for each HTable. >>> >> > >>> >> > In 0.98/trunk: >>> >> > >>> >> > 1. HTablePool will be removed. It is not longer needed. >>> >> > 2. All constructors in HTable will be removed and changed to be >>> >> protected. >>> >> > All code use HTableInterface only. >>> >> > 3. HConnectionManager.getConnection() will be removed. >>> >> > 3. All HConnection caching (deleteConnection, etc,etc) will be >>> removed, >>> >> as >>> >> > it is no longer needed. >>> >> > >>> >> > >>> >> > The new flow of setting up a client would look like this: >>> >> > >>> >> > ----- Snip ----- >>> >> > // connection to the cluster >>> >> > HConnection conn = HConnectionManager.createConnection(conf); >>> >> > ... >>> >> > // When the cluster connection is established get an HTableInterface >>> for >>> >> > each operation or thread. >>> >> > // HConnection.getTable(...) is lightweight. The table is really just >>> a >>> >> > convenient place to call table method and for a temporary batch cache. >>> >> > // It is in fact less overhead than HTablePool had when retrieving a >>> >> > cached HTable. >>> >> > // The HTableInterface returned is not thread safe as before. >>> >> > // It's fine to get 1000's of these. >>> >> > // Don't cache the longer than the lifetime of the HConnection >>> >> > HTableInterface table = conn.getTable("MyTable"); >>> >> > ... >>> >> > // just flushes outstanding commit, no futher cleanup needed, can be >>> >> > omitted. >>> >> > // HConnection holds no references to the returned HTable objects, >>> they >>> >> > can be GC'd as soon as they leave scope. >>> >> > table.close(); >>> >> > ... >>> >> > conn.close(); // done with the cluster, release resources >>> >> > ----- Snip ----- >>> >> > >>> >> > The HConnection will maintain and share its own ThreadPool for all >>> batch >>> >> > operations executed by the HTables. >>> >> > This can overridden per HConnection and/or per individual HTable >>> object. >>> >> > >>> >> > I will commit the new API to all branches early next week. >>> >> > >>> >> > Questions? Comments? Concerns? Praise? >>> >> > >>> >> > -- Lars >>> >> >>> > >>> > >>> >