Re: Fast retrieval of multiple rows with non-sequential keys

Jochen Frey Mon, 05 Oct 2009 07:07:46 -0700

Thanks JG.

I'll check out JIRA and educate myself.

If I had my wish - I'd get the results streamed back to me, so that Ican start work on the results while they're being retrieved.


:-)

J

On Oct 5, 2009, at 3:36 PM, Jonathan Gray wrote:

This is being worked on. Ideally, a solution would batch things byregionand then by regionserver, so that the total number of RPC callswould at a
maximum be the number of servers.

Follow HBASE-1845 and related issues.
You can use threads and add some parallelism of the multiple gets inyour
application for now.

JG

On Mon, October 5, 2009 3:02 am, Jochen Frey wrote:
I want to use HBase as a BLOB store for a search engine application.
For that the objects will be stored in one HBase table (~ 1B rows).
Object size is typically between 1kB to 20kB.


I am concerned about my read pattern, where our typical read retrieve
between tens and thousands of rows in random order. Looking at theJava API
the only method to retrieve rows in random order is to issue multiple

Result = HTable.get(Get)
requests sequentially (I assume a Scanner is not a good idea sincethe
rows are need are spread randomly across the table / regions / etc.).

My concern is that with that pattern I have one rpc call per item,
which seems to be a lot of overhead, especially when I need toretrieve
100s or 1,000s of rows.


Would it not be preferable to batch up requests so that all rows
requested would be grouped by region, and then send off in parallelto
regions for retrieval - that way there'd be fewer RPC calls, and they
could be executed in parallel, as well? As such an addition to the
interface could look something like

List<Result> = HTable.get(List<Get>)


Am I making sense? Is there something that I am missing?


Thanks!
Jochen


---
m: [email protected]
p: +1.415.706.1341

Re: Fast retrieval of multiple rows with non-sequential keys

Reply via email to