Re: Data Processing in hbase

Fernando Padilla Wed, 22 Jul 2009 00:43:28 -0700

This might be a simplified view, but this is how I understand it..



HBase stores the data in a distributed way, by using various RegionServers.

MapReduce distributes computations, by using TaskTrackers.

So when a MapReduce job is run it tries to run Map/Reduce operationsclose run on TaskTrackers co-located with RegionServers serving thedata.. thus co-locating the computations with the data..

So, if you use Map/Reduce, you get computation and data distribution bydefault, as well as a best effort to co-locate computation with data,thus maxing out efficiency as much as possible.

Now, you don't have to use Map/Reduce, but then you will have to takethe extra effort to distribute your computations, and try to co-locatethem close to the data..

That is in fact something that I'm planning on doing, since I'm not sureyet if my computations are suited for Map/Reduce.. So I will probablyrun my own Java process co-located with the Hbase RegionServers.. Andmake sure that when my code asks for data, it gets the local data asmuch as possible...





On 7/21/09 11:48 PM, bharath vissapragada wrote:

That means .. it is not very useful to write java codes (using API)  ..
because any way it is not using the real power of hadoop(distributed
processing) instead it has the overhead of fetching data from other machines
right?

On Wed, Jul 22, 2009 at 12:12 PM, Amandeep Khurana<[email protected]>  wrote:

Yes.. Only if you use MR. If you are writing your own code, it'll pull the
records to the place where you run the code.

On Tue, Jul 21, 2009 at 11:39 PM, Fernando Padilla<[email protected]

wrote:
That is if you use Hadoop MapReduce right? Not if you simply access HBase
through a standard api (like java)?



On 7/21/09 9:49 PM, Amandeep Khurana wrote:

Bharath,

The processing is done as local to the RS as possible. The first attempt
is
at doing it local on the same node. If thats not possible, its done on

the

same rack.

-ak


On Tue, Jul 21, 2009 at 9:43 PM, bharath vissapragada<
[email protected]>   wrote:

  Hi all,

I have one simple doubt in hbase ,

Suppose i use a scanner to iterate through all the rows in the hbase

and

process the data in the table corresponding to those rows .Is the
processing
of that data done locally on the region server in which that particular
region is located or is it transferred over network so that all the
processing is done on a single machine on which that script runs!!

thanks

Re: Data Processing in hbase

Reply via email to