Re: Question on MapReduce

Satheesh Kumar Wed, 16 May 2012 09:50:57 -0700

Couple of more quick questions
1. In a Hadoop node with DAS, what is the typical storage utilization? In
other words, for a given total data size, how much capacity should we plan
per node given that compute is not a huge bottleneck?
2. What is the expected storage throughput from DAS to the compute on the
same node assuming SATA interface?


Thanks,
Satheesh

On Fri, May 11, 2012 at 12:58 PM, Leo Leung <lle...@ddn.com> wrote:

>
> This maybe dated materials.
>
> Cloudera and HDP folks please correct with updates :)
>
>
> http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
> http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/
>
>
> http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hardware/
>
> Hope this helps.
>
>
>
> -----Original Message-----
> From: Satheesh Kumar [mailto:nks...@gmail.com]
> Sent: Friday, May 11, 2012 12:48 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Question on MapReduce
>
> Thanks, Leo. What is the config of a typical data node in a Hadoop cluster
> - cores, storage capacity, and connectivity (SATA?).? How many
> tasktrackers scheduled per core in general?
>
> Is there a best practices guide somewhere?
>
> Thanks,
> Satheesh
>
> On Fri, May 11, 2012 at 10:48 AM, Leo Leung <lle...@ddn.com> wrote:
>
> > Nope, you must tune the config on that specific super node to have
> > more M/R slots (this is for 1.0.x) This does not mean the JobTracker
> > will be eager to stuff that super node with all the M/R jobs at hand.
> >
> > It still goes through the scheduler,  Capacity Scheduler is most
> > likely what you have.  (check your config)
> >
> > IMO, If the data locality is not going to be there, your cluster is
> > going to suffer from Network I/O.
> >
> >
> > -----Original Message-----
> > From: Satheesh Kumar [mailto:nks...@gmail.com]
> > Sent: Friday, May 11, 2012 9:51 AM
> > To: common-user@hadoop.apache.org
> > Subject: Question on MapReduce
> >
> > Hi,
> >
> > I am a newbie on Hadoop and have a quick question on optimal compute vs.
> > storage resources for MapReduce.
> >
> > If I have a multiprocessor node with 4 processors, will Hadoop
> > schedule higher number of Map or Reduce tasks on the system than on a
> > uni-processor system? In other words, does Hadoop detect denser
> > systems and schedule denser tasks on multiprocessor systems?
> >
> > If yes, will that imply that it makes sense to attach higher capacity
> > storage to store more number of blocks on systems with dense compute?
> >
> > Any insights will be very useful.
> >
> > Thanks,
> > Satheesh
> >
>

Re: Question on MapReduce

Reply via email to