I had a similar curiosity, but more regarding disk speed.
Can I assume linear improvement between 7200rpm -> 10k rpm -> 15k rpm? How
much of a bottleneck is disk access?

Another question is regarding hardware redundancy. What is the relative
value of the following:
- RAID / hot-swappable drives
- dual NICs
- redundant backplane
- redundant power supply
- UPS

I've been assuming that RAID is generally a good idea (disks fail quite
often, and it's cheaper to hotswap a drive than to rebuild an entire box).
Dual NICs are also good, as both can be used at the same time. Everything
else is not necessary in a Hadoop cluster.

On Thu, Apr 2, 2009 at 11:33 AM, tim robertson <[email protected]>wrote:

> Thanks Miles,
>
> Thus far most of my work has been on EC2 large instances and *mostly*
> my code is not memory intensive (I sometimes do joins against polygons
> and hold Geospatial indexes in memory, but am aware of keeping things
> within the -Xmx for this).
> I am mostly  looking to move routine data processing and
> transformation (lots of distinct, count and group by operations) off a
> chunky mysql DB (200million rows and growing) which gets all locked
> up.
>
> We have gigabit switches.
>
> Cheers
>
> Tim
>
>
>
> On Thu, Apr 2, 2009 at 4:15 PM, Miles Osborne <[email protected]> wrote:
> > make sure you also have a fast switch, since you will be transmitting
> > data across your network and this will come to bite you otherwise
> >
> > (roughly, you need one core per hadoop-related job, each mapper, task
> > tracker etc;  the per-core memory may be too small if you are doing
> > anything memory-intensive.  we have 8-core boxes with 50 -- 33 GB RAM
> > and 8 x 1 TB disks on each one;  one box however just has 16 GB of RAM
> > and it routinely falls over when we run jobs on it)
> >
> > Miles
> >
> > 2009/4/2 tim robertson <[email protected]>:
> >> Hi all,
> >>
> >> I am not a hardware guy but about to set up a 10 node cluster for some
> >> processing of (mostly) tab files, generating various indexes and
> >> researching HBase, Mahout, pig, hive etc.
> >>
> >> Could someone please sanity check that these specs look sensible?
> >> [I know 4 drives would be better but price is a factor (second hand
> >> not an option, hosting is not either as there is very good bandwidth
> >> provided)]
> >>
> >> Something along the lines of:
> >>
> >> Dell R200 (8GB is max memory)
> >> Quad Core Intel® Xeon® X3360, 2.83GHz, 2x6MB Cache, 1333MHz FSB
> >> 8GB Memory, DDR2, 800MHz (4x2GB Dual Ranked DIMMs)
> >> 2x 500GB 7.200 rpm 3.5-inch SATA Hard Drive
> >>
> >>
> >> Dell R300 (can be expanded to 24GB RAM)
> >> Quad Core Intel® Xeon® X3363, 2.83GHz, 2x6M Cache, 1333MHz FS
> >> 8GB Memory, DDR2, 667MHz (2x4GB Dual Ranked DIMMs)
> >> 2x 500GB 7.200 rpm 3.5-inch SATA Hard Drive
> >>
> >>
> >> If there is a major flaw please can you let me know.
> >>
> >> Thanks,
> >>
> >> Tim
> >> (not a hardware guy ;o)
> >>
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> >
>

Reply via email to