Sandeep,

I think one critical piece missing is whether or not you are counting the
24 TB as raw or as replicated.  In a standard environment with a rep factor
of 3 you really need 72 TB disk space which triples your hardware
requirements.

Regardless, my experience has been to favor A and scale out vs a scale up.
 A simple metric might be a 2 quad core would equate to 8+ worker threads
and B would be 16+.  So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB
for DN you have 28/8 (~3.5) for each worker.  The same overhead on B would
be 44/16 (2.75 GB ) per worker.  This is but one metric.

The other is amount of HD per core.  I've heard anywhere from .8 to 1.5 TB/
core so that would definitely favor A.

Perhaps the biggest factor of all is expected workload.  Will you be
computationally bound or IO bound?  I.e. all things being equal
hardware-wise will you be spending most of your time crunching or reading
data?

A few thoughts.

-Matt

On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P <
sandeepreddy.3...@gmail.com> wrote:

> Hi,
>  I need to know difference between two hardware configurations below for
> 24TB of data. (slave machines only for hadoop,hive and pig)
>
> TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
>
> TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
>
> suppose we choose 4 type A machines for 24tb of data and 2 type b machines
> for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is
> same for 4Type A and 2 Type B machines.
>
> I need which type of machines will give me best results in terms of
> performance.
>
>
> --
> Thanks,
> sandeep
>

Reply via email to