Thanks for the reply Matt, We have 6TB of raw data. We are io bound.
On Wed, Jun 13, 2012 at 11:44 AM, Matt Davies <m...@mattdavies.net> wrote: > Sandeep, > > I think one critical piece missing is whether or not you are counting the > 24 TB as raw or as replicated. In a standard environment with a rep factor > of 3 you really need 72 TB disk space which triples your hardware > requirements. > > Regardless, my experience has been to favor A and scale out vs a scale up. > A simple metric might be a 2 quad core would equate to 8+ worker threads > and B would be 16+. So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB > for DN you have 28/8 (~3.5) for each worker. The same overhead on B would > be 44/16 (2.75 GB ) per worker. This is but one metric. > > The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/ > core so that would definitely favor A. > > Perhaps the biggest factor of all is expected workload. Will you be > computationally bound or IO bound? I.e. all things being equal > hardware-wise will you be spending most of your time crunching or reading > data? > > A few thoughts. > > -Matt > > On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P < > sandeepreddy.3...@gmail.com> wrote: > > > Hi, > > I need to know difference between two hardware configurations below for > > 24TB of data. (slave machines only for hadoop,hive and pig) > > > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > > > suppose we choose 4 type A machines for 24tb of data and 2 type b > machines > > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost > is > > same for 4Type A and 2 Type B machines. > > > > I need which type of machines will give me best results in terms of > > performance. > > > > > > -- > > Thanks, > > sandeep > > > -- Thanks, sandeep