Thanks for the reply Matt,
We have 6TB of raw data. We are io bound.

On Wed, Jun 13, 2012 at 11:44 AM, Matt Davies <m...@mattdavies.net> wrote:

> Sandeep,
>
> I think one critical piece missing is whether or not you are counting the
> 24 TB as raw or as replicated.  In a standard environment with a rep factor
> of 3 you really need 72 TB disk space which triples your hardware
> requirements.
>
> Regardless, my experience has been to favor A and scale out vs a scale up.
>  A simple metric might be a 2 quad core would equate to 8+ worker threads
> and B would be 16+.  So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB
> for DN you have 28/8 (~3.5) for each worker.  The same overhead on B would
> be 44/16 (2.75 GB ) per worker.  This is but one metric.
>
> The other is amount of HD per core.  I've heard anywhere from .8 to 1.5 TB/
> core so that would definitely favor A.
>
> Perhaps the biggest factor of all is expected workload.  Will you be
> computationally bound or IO bound?  I.e. all things being equal
> hardware-wise will you be spending most of your time crunching or reading
> data?
>
> A few thoughts.
>
> -Matt
>
> On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P <
> sandeepreddy.3...@gmail.com> wrote:
>
> > Hi,
> >  I need to know difference between two hardware configurations below for
> > 24TB of data. (slave machines only for hadoop,hive and pig)
> >
> > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
> >
> > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
> >
> > suppose we choose 4 type A machines for 24tb of data and 2 type b
> machines
> > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost
> is
> > same for 4Type A and 2 Type B machines.
> >
> > I need which type of machines will give me best results in terms of
> > performance.
> >
> >
> > --
> > Thanks,
> > sandeep
> >
>



-- 
Thanks,
sandeep

Reply via email to