Anyway to load certain Key/Value pair fast?

William Kang Tue, 12 Feb 2013 21:25:20 -0800

Hi All,
I am trying to figure out a good solution for such a scenario as following.


1. I have a 2T file (let's call it A), filled by key/value pairs,
which is stored in the HDFS with the default 64M block size. In A,
each key is less than 1K and each value is about 20M.

2. Occasionally, I will run analysis by using a different type of data
(usually less than 10G, and let's call it B) and do look-up table
alike operations by using the values in A. B resides in HDFS as well.

3. This analysis would require loading only a small number of values
from A (usually less than 1000 of them) into the memory for fast
look-up against the data in B. The way B finds the few values in A is
by looking up for the key in A.

Is there an efficient way to do this?

I was thinking if I could identify the locality of the block that
contains the few values, I might be able to push the B into the few
nodes that contains the few values in A?  Since I only need to do this
occasionally, maintaining a distributed database such as HBase cant be
justified.

Many thanks.


Cao

Anyway to load certain Key/Value pair fast?

Reply via email to