Couple of more quick questions 1. In a Hadoop node with DAS, what is the typical storage utilization? In other words, for a given total data size, how much capacity should we plan per node given that compute is not a huge bottleneck? 2. What is the expected storage throughput from DAS to the compute on the same node assuming SATA interface?
Thanks, Satheesh On Fri, May 11, 2012 at 12:58 PM, Leo Leung <lle...@ddn.com> wrote: > > This maybe dated materials. > > Cloudera and HDP folks please correct with updates :) > > > http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ > http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ > > > http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/ > > Hope this helps. > > > > -----Original Message----- > From: Satheesh Kumar [mailto:nks...@gmail.com] > Sent: Friday, May 11, 2012 12:48 PM > To: common-user@hadoop.apache.org > Subject: Re: Question on MapReduce > > Thanks, Leo. What is the config of a typical data node in a Hadoop cluster > - cores, storage capacity, and connectivity (SATA?).? How many > tasktrackers scheduled per core in general? > > Is there a best practices guide somewhere? > > Thanks, > Satheesh > > On Fri, May 11, 2012 at 10:48 AM, Leo Leung <lle...@ddn.com> wrote: > > > Nope, you must tune the config on that specific super node to have > > more M/R slots (this is for 1.0.x) This does not mean the JobTracker > > will be eager to stuff that super node with all the M/R jobs at hand. > > > > It still goes through the scheduler, Capacity Scheduler is most > > likely what you have. (check your config) > > > > IMO, If the data locality is not going to be there, your cluster is > > going to suffer from Network I/O. > > > > > > -----Original Message----- > > From: Satheesh Kumar [mailto:nks...@gmail.com] > > Sent: Friday, May 11, 2012 9:51 AM > > To: common-user@hadoop.apache.org > > Subject: Question on MapReduce > > > > Hi, > > > > I am a newbie on Hadoop and have a quick question on optimal compute vs. > > storage resources for MapReduce. > > > > If I have a multiprocessor node with 4 processors, will Hadoop > > schedule higher number of Map or Reduce tasks on the system than on a > > uni-processor system? In other words, does Hadoop detect denser > > systems and schedule denser tasks on multiprocessor systems? > > > > If yes, will that imply that it makes sense to attach higher capacity > > storage to store more number of blocks on systems with dense compute? > > > > Any insights will be very useful. > > > > Thanks, > > Satheesh > > >