Moving HBase MR computation close to data

Naama Kraus Sun, 13 Jul 2008 22:00:21 -0700

Hi,

I've peeked at HBase code, TableSplit#getLocations(). I noticed that the
method returns a random node for now. I was trying to think what should be
returned if one wishes to have computation close to data. As a table split
is per region, I could think of returning the node managing that region
(region server). That would get computation close to data at HBase level,
but not necessarily at file system level. As HStoreFiles are stored in HDFS,
their actual location could be on a remote node, and not the region server
node.
Can anyone comment on my flow of thinking ? Am I wrong somewhere ?


To sum up, I'd like to understand if there is any notion of bringing
computation close to data when working with HBase ? If so, are there any
plans to implement it in future releases ? Or is the current implementation
good enough (if so, can you explain why ?).

Thanks for any input, Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Moving HBase MR computation close to data

Reply via email to