This is being worked on. Ideally, a solution would batch things by region and then by regionserver, so that the total number of RPC calls would at a maximum be the number of servers.
Follow HBASE-1845 and related issues. You can use threads and add some parallelism of the multiple gets in your application for now. JG On Mon, October 5, 2009 3:02 am, Jochen Frey wrote: > I want to use HBase as a BLOB store for a search engine application. > For that the objects will be stored in one HBase table (~ 1B rows). > Object size is typically between 1kB to 20kB. > > > I am concerned about my read pattern, where our typical read retrieve > between tens and thousands of rows in random order. Looking at the Java API > the only method to retrieve rows in random order is to issue multiple > > Result = HTable.get(Get) > > > requests sequentially (I assume a Scanner is not a good idea since the > rows are need are spread randomly across the table / regions / etc.). > > My concern is that with that pattern I have one rpc call per item, > which seems to be a lot of overhead, especially when I need to retrieve > 100s or 1,000s of rows. > > > Would it not be preferable to batch up requests so that all rows > requested would be grouped by region, and then send off in parallel to > regions for retrieval - that way there'd be fewer RPC calls, and they > could be executed in parallel, as well? As such an addition to the > interface could look something like > > List<Result> = HTable.get(List<Get>) > > > Am I making sense? Is there something that I am missing? > > > Thanks! > Jochen > > > >
