I really couldn't be specific.

The more data that has to be moved across the wire, the more network i/o.

For example, if you have very large values, and a very large table, and you have that as the input to your MR. You could potentially be network i/o bound.

It should be very easy to test how your own jobs run on your own cluster using Ganglia and hadoop/mr logging/output.

bharath vissapragada wrote:
JG

Can you please elaborate on the last statement "for some".. by giving an
example or some kind of scenario in which it can take place where MR jobs
involve huge amount of data.

Thanks.

On Fri, Aug 21, 2009 at 11:24 PM, Jonathan Gray <jl...@streamy.com> wrote:

Ryan,

In older versions of HBase, when we did not attempt any data locality, we
had a few users running jobs that became network i/o bound.  It wasn't a
latency issue it was a bandwidth issue.

That's actually when/why an attempt at better data locality for HBase MR
was made in the first place.

I hadn't personally experienced it but I recall two users who had. After
they made a first-stab patch, I ran some comparisons and noticed a
significant reduction in network i/o for data-intensive MR jobs.  They also
were no longer network i/o bound on their jobs, if I recall, and became disk
i/o bound (as one would expect/hope).

For a majority of use cases, it doesn't matter in a significant way at all.
 But I have seen it make a measurable difference for some.

JG


bharath vissapragada wrote:

Thanks Ryan

I was just explaining with an example .. I have TBs of data to work
with.Just i wanted to know that scheduler TRIES to assign the reduce phase
to keep the data local (i.e.,TRYING  to assign it to the machine with
machine with greater num of key values).
I was just explaining it with an example .

Thanks for ur reply (following u on twitter :))

On Fri, Aug 21, 2009 at 12:13 PM, Ryan Rawson <ryano...@gmail.com> wrote:

 hey,
Yes the hadoop system attempts to assign map tasks to data local, but
why would you be worried about this for 5 values?  The max value size
in hbase is Integer.MAX_VALUE, so it's not like you have much data to
shuffle. Once your blobs > ~ 64mb or so, it might make more sense to
use HDFS directly and keep only the metadata in hbase (including
things like location of the data blob).

I think people are confused about how optimal map reduces have to be.
Keeping all the data super-local on each machine is not always helping
you, since you have to read via a socket anyways. Going remote doesn't
actually make things that much slower, since on a modern lan ping
times are < 0.1ms.  If your entire cluster is hanging off a single
switch, there is nearly unlimited bandwidth between all nodes
(certainly much higher than any single system could push).  Only once
you go multi-switch then switch-locality (aka rack locality) becomes
important.

Remember, hadoop isn't about the instantaneous speed of any job, but
about running jobs in a highly scalable manner that works on tens or
tens of thousands of nodes. You end up blocking on single machine
limits anyways, and the r=3 of HDFS helps you transcend a single
machine read speed for large files. Keeping the data transfer local in
this case results in lower performance.

If you want max local speed, I suggest looking at CUDA.


On Thu, Aug 20, 2009 at 9:09 PM, bharath
vissapragada<bharathvissapragada1...@gmail.com> wrote:

Aamandeep , Gray and Purtell thanks for your replies .. I have found
them
very useful.

You said to increase the number of reduce tasks . Suppose the number of
reduce tasks is more than number of distinct map output keys , some of

the

reduce processes may go waste ? is that the case?

Also  I have one more doubt ..I have 5 values for a corresponding key on

one

region  and other 2 values on 2 different region servers.
Does hadoop Map reduce take care of moving these 2 diff values to the

region

with 5 values instead of moving those 5 values to other system to

minimize

the dataflow? Is this what is happening inside ?

On Fri, Aug 21, 2009 at 9:03 AM, Andrew Purtell <apurt...@apache.org>

wrote:

The behavior of TableInputFormat is to schedule one mapper for every
table
region.
In addition to what others have said already, if your reducer is doing
little more than storing data back into HBase (via TableOutputFormat),

then
you can consider writing results back to HBase directly from the mapper
to
avoid incurring the overhead of sort/shuffle/merge which happens within
the
Hadoop job framework as map outputs are input into reducers. For that
type
of use case -- using the Hadoop mapreduce subsystem as essentially a
grid
scheduler -- something like job.setNumReducers(0) will do the trick.
Best regards,

 - Andy




________________________________
From: john smith <js1987.sm...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Friday, August 21, 2009 12:42:36 AM
Subject: Doubt in HBase

Hi all ,

I have one small doubt . Kindly answer it even if it sounds silly.

Iam using Map Reduce in HBase in distributed mode .  I have a table

which
spans across 5 region servers . I am using TableInputFormat to read the
data
from the tables in the map . When i run the program , by default how

many
map regions are created ? Is it one per region server or more ?
Also after the map task is over.. reduce task is taking a bit more time

.
Is
it due to moving the map output across the regionservers? i.e, moving

the
values of same key to a particular reduce phase to start the reducer? Is
there any way i can optimize the code (e.g. by storing data of same

reducer
nearby )
Thanks :)






Reply via email to