Yes you can/will have contention when sharing the resources like that. Most clusters are built on 4 core machines with 4GB of RAM (some slightly worse, some slightly better) so there are sufficient resources to go around.
You'll need to limit the total number of maps/reduces allowed per node to ensure that running tasks do not starve the Datanode or Regionserver. The limit would depend on the nature of your tasks. If CPU-bound, you would want to make sure no more than 2 (or 3 if you want to push it) were running on any given node if you had four cores. JG > -----Original Message----- > From: Sean Laurent [mailto:[email protected]] > Sent: Tuesday, February 03, 2009 2:49 PM > To: [email protected] > Subject: Re: HBase and Hadoop MapReduce - Common setups? > > Okay, that sounds like what I expected. But isn't there a strong > likelihood > for competition for HDFS resources between a M/R task running on a > TaskTracker and the RegionServer running on the same machine? > > In other words, let's say a Hadoop M/R task is running on a given > TaskTracker and it's actively reading data from HDFS via the DataNode > (and > both are on the same machine for locality reasons). At the same time, > another client is running an HBase BatchUpdate that affects the data > stored > on that very same DataNode. Won't that create a bottleneck? Or do the > HBase > operations like BatchUpdate actually run as M/R tasks? Or am I over > estimating the data-retrieval problem? > > Thanks! > > -Sean > > On Tue, Feb 3, 2009 at 4:42 PM, Jonathan Gray <[email protected]> > wrote: > > > Sean, > > > > You're going to want to run your TaskTrackers local to your DataNodes > and > > RegionServers, again for locality reasons. That's one of the primary > > advantages of MapReduce, moving computation to data. > > > > Otherwise, you are on track. Of course the setup depends on what > you're > > doing, but what you describe is on a majority of the HBase setups I'm > aware > > of. > > > > JG > > > > > -----Original Message----- > > > From: Sean Laurent [mailto:[email protected]] > > > Sent: Tuesday, February 03, 2009 2:13 PM > > > To: [email protected] > > > Subject: HBase and Hadoop MapReduce - Common setups? > > > > > > Howdy folks, > > > We're evaluating HBase and we're trying to get a good solid picture > of > > > how > > > everything fits together... specifically, we're wondering how > people > > > commonly setup HBase. I'm imagining you typically run the region > > > servers on > > > the same machines as the HDFS data nodes to gain data locality > > > benefits. And > > > from what I've seen on the mailing list, it's typically recommended > > > (although it sounds like it's up for debate in terms of SPoF > issues) to > > > run > > > separate machines for the HBaseMaster and NameNode servers. > > > > > > Is it something along the following lines? > > > > > > 1x HBaseMaster > > > 1x HDFS NameNode > > > N machines with both HRegionServer and DataNode > > > > > > Now what about Hadoop and task trackers? Do people typically run > > > completely > > > separate clusters for their M/R tasks? Do they run task trackers > along > > > side > > > the region server and data nodes? Or add machines that run > TaskTracker > > > and > > > DataNode servers but ~not~ HRegionServer? > > > > > > Any thoughts or opinions would be greatly appreciated! > >
