Sandeep, I think one critical piece missing is whether or not you are counting the 24 TB as raw or as replicated. In a standard environment with a rep factor of 3 you really need 72 TB disk space which triples your hardware requirements.
Regardless, my experience has been to favor A and scale out vs a scale up. A simple metric might be a 2 quad core would equate to 8+ worker threads and B would be 16+. So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB for DN you have 28/8 (~3.5) for each worker. The same overhead on B would be 44/16 (2.75 GB ) per worker. This is but one metric. The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/ core so that would definitely favor A. Perhaps the biggest factor of all is expected workload. Will you be computationally bound or IO bound? I.e. all things being equal hardware-wise will you be spending most of your time crunching or reading data? A few thoughts. -Matt On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P < sandeepreddy.3...@gmail.com> wrote: > Hi, > I need to know difference between two hardware configurations below for > 24TB of data. (slave machines only for hadoop,hive and pig) > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > suppose we choose 4 type A machines for 24tb of data and 2 type b machines > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is > same for 4Type A and 2 Type B machines. > > I need which type of machines will give me best results in terms of > performance. > > > -- > Thanks, > sandeep >