RE: HBase and Hadoop MapReduce - Common setups?

Jonathan Gray Tue, 03 Feb 2009 15:40:31 -0800

Yes you can/will have contention when sharing the resources like that.

Most clusters are built on 4 core machines with 4GB of RAM (some slightly
worse, some slightly better) so there are sufficient resources to go around.


You'll need to limit the total number of maps/reduces allowed per node to
ensure that running tasks do not starve the Datanode or Regionserver.  The
limit would depend on the nature of your tasks.  If CPU-bound, you would
want to make sure no more than 2 (or 3 if you want to push it) were running
on any given node if you had four cores.

JG

> -----Original Message-----
> From: Sean Laurent [mailto:[email protected]]
> Sent: Tuesday, February 03, 2009 2:49 PM
> To: [email protected]
> Subject: Re: HBase and Hadoop MapReduce - Common setups?
> 
> Okay, that sounds like what I expected. But isn't there a strong
> likelihood
> for competition for HDFS resources between a M/R task running on a
> TaskTracker and the RegionServer running on the same machine?
> 
> In other words, let's say a Hadoop M/R task is running on a given
> TaskTracker and it's actively reading data from HDFS via the DataNode
> (and
> both are on the same machine for locality reasons). At the same time,
> another client is running an HBase BatchUpdate that affects the data
> stored
> on that very same DataNode. Won't that create a bottleneck? Or do the
> HBase
> operations like BatchUpdate actually run as M/R tasks? Or am I over
> estimating the data-retrieval problem?
> 
> Thanks!
> 
> -Sean
> 
> On Tue, Feb 3, 2009 at 4:42 PM, Jonathan Gray <[email protected]>
> wrote:
> 
> > Sean,
> >
> > You're going to want to run your TaskTrackers local to your DataNodes
> and
> > RegionServers, again for locality reasons.  That's one of the primary
> > advantages of MapReduce, moving computation to data.
> >
> > Otherwise, you are on track.  Of course the setup depends on what
> you're
> > doing, but what you describe is on a majority of the HBase setups I'm
> aware
> > of.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Sean Laurent [mailto:[email protected]]
> > > Sent: Tuesday, February 03, 2009 2:13 PM
> > > To: [email protected]
> > > Subject: HBase and Hadoop MapReduce - Common setups?
> > >
> > > Howdy folks,
> > > We're evaluating HBase and we're trying to get a good solid picture
> of
> > > how
> > > everything fits together... specifically, we're wondering how
> people
> > > commonly setup HBase. I'm imagining you typically run the region
> > > servers on
> > > the same machines as the HDFS data nodes to gain data locality
> > > benefits. And
> > > from what I've seen on the mailing list, it's typically recommended
> > > (although it sounds like it's up for debate in terms of SPoF
> issues) to
> > > run
> > > separate machines for the HBaseMaster and NameNode servers.
> > >
> > > Is it something along the following lines?
> > >
> > > 1x HBaseMaster
> > > 1x HDFS NameNode
> > > N machines with both HRegionServer and DataNode
> > >
> > > Now what about Hadoop and task trackers? Do people typically run
> > > completely
> > > separate clusters for their M/R tasks? Do they run task trackers
> along
> > > side
> > > the region server and data nodes? Or add machines that run
> TaskTracker
> > > and
> > > DataNode servers but ~not~ HRegionServer?
> > >
> > > Any thoughts or opinions would be greatly appreciated!
> >

RE: HBase and Hadoop MapReduce - Common setups?

Reply via email to