I had a similar curiosity, but more regarding disk speed. Can I assume linear improvement between 7200rpm -> 10k rpm -> 15k rpm? How much of a bottleneck is disk access?
Another question is regarding hardware redundancy. What is the relative value of the following: - RAID / hot-swappable drives - dual NICs - redundant backplane - redundant power supply - UPS I've been assuming that RAID is generally a good idea (disks fail quite often, and it's cheaper to hotswap a drive than to rebuild an entire box). Dual NICs are also good, as both can be used at the same time. Everything else is not necessary in a Hadoop cluster. On Thu, Apr 2, 2009 at 11:33 AM, tim robertson <[email protected]>wrote: > Thanks Miles, > > Thus far most of my work has been on EC2 large instances and *mostly* > my code is not memory intensive (I sometimes do joins against polygons > and hold Geospatial indexes in memory, but am aware of keeping things > within the -Xmx for this). > I am mostly looking to move routine data processing and > transformation (lots of distinct, count and group by operations) off a > chunky mysql DB (200million rows and growing) which gets all locked > up. > > We have gigabit switches. > > Cheers > > Tim > > > > On Thu, Apr 2, 2009 at 4:15 PM, Miles Osborne <[email protected]> wrote: > > make sure you also have a fast switch, since you will be transmitting > > data across your network and this will come to bite you otherwise > > > > (roughly, you need one core per hadoop-related job, each mapper, task > > tracker etc; the per-core memory may be too small if you are doing > > anything memory-intensive. we have 8-core boxes with 50 -- 33 GB RAM > > and 8 x 1 TB disks on each one; one box however just has 16 GB of RAM > > and it routinely falls over when we run jobs on it) > > > > Miles > > > > 2009/4/2 tim robertson <[email protected]>: > >> Hi all, > >> > >> I am not a hardware guy but about to set up a 10 node cluster for some > >> processing of (mostly) tab files, generating various indexes and > >> researching HBase, Mahout, pig, hive etc. > >> > >> Could someone please sanity check that these specs look sensible? > >> [I know 4 drives would be better but price is a factor (second hand > >> not an option, hosting is not either as there is very good bandwidth > >> provided)] > >> > >> Something along the lines of: > >> > >> Dell R200 (8GB is max memory) > >> Quad Core Intel® Xeon® X3360, 2.83GHz, 2x6MB Cache, 1333MHz FSB > >> 8GB Memory, DDR2, 800MHz (4x2GB Dual Ranked DIMMs) > >> 2x 500GB 7.200 rpm 3.5-inch SATA Hard Drive > >> > >> > >> Dell R300 (can be expanded to 24GB RAM) > >> Quad Core Intel® Xeon® X3363, 2.83GHz, 2x6M Cache, 1333MHz FS > >> 8GB Memory, DDR2, 667MHz (2x4GB Dual Ranked DIMMs) > >> 2x 500GB 7.200 rpm 3.5-inch SATA Hard Drive > >> > >> > >> If there is a major flaw please can you let me know. > >> > >> Thanks, > >> > >> Tim > >> (not a hardware guy ;o) > >> > > > > > > > > -- > > The University of Edinburgh is a charitable body, registered in > > Scotland, with registration number SC005336. > > >
