Re: Fast retrieval of multiple rows with non-sequential keys

Jonathan Gray Mon, 05 Oct 2009 06:38:12 -0700

This is being worked on.  Ideally, a solution would batch things by region
and then by regionserver, so that the total number of RPC calls would at a
maximum be the number of servers.


Follow HBASE-1845 and related issues.

You can use threads and add some parallelism of the multiple gets in your
application for now.

JG

On Mon, October 5, 2009 3:02 am, Jochen Frey wrote:
> I want to use HBase as a BLOB store for a search engine application.
> For that the objects will be stored in one HBase table (~ 1B rows).
> Object size is typically between 1kB to 20kB.
>
>
> I am concerned about my read pattern, where our typical read retrieve
> between tens and thousands of rows in random order. Looking at the Java API
> the only method to retrieve rows in random order is to issue multiple
>
> Result = HTable.get(Get)
>
>
> requests sequentially (I assume a Scanner is not a good idea since the
> rows are need are spread randomly across the table / regions / etc.).
>
> My concern is that with that pattern I have one rpc call per item,
> which seems to be a lot of overhead, especially when I need to retrieve
> 100s or 1,000s of rows.
>
>
> Would it not be preferable to batch up requests so that all rows
> requested would be grouped by region, and then send off in parallel to
> regions for retrieval - that way there'd be fewer RPC calls, and they
> could be executed in parallel, as well? As such an addition to the
> interface could look something like
>
> List<Result> = HTable.get(List<Get>)
>
>
> Am I making sense? Is there something that I am missing?
>
>
> Thanks!
> Jochen
>
>
>
>

Re: Fast retrieval of multiple rows with non-sequential keys

Reply via email to