For cost reasons, we just bonded two 1G network ports together. A single stream won't go past 1Gbps, but concurrent ones do -- this is with the Linux built-in bonding. The network is only stressed during 'sort-like' jobs or big replication events. We also removed some disk bottlenecks by tuning the file systems aggressively -- using a separate partition for the M/R temp and the location that jars may unpack into helps tremendously. Ext4 can be configured to delay flushing to disk for this temp space, which for small jobs decreases the I/O tremendously as many files are deleted before they get pushed to disk.
On 6/27/11 5:10 PM, "Ryan Rawson" <[email protected]> wrote: >On the subject of gige vs 10-gige, I think that we will very shortly >be seeing interest in 10gig, since gige is only 120MB/sec - 1 hard >drive of streaming data. Nodes with 4+ disks are throttled by the >network. On a small cluster (20 nodes), the replication traffic can >choke a cluster to death. The only way to fix quickly it is to bring >that node back up. Perhaps the HortonWorks guys can work on that. > >-ryan > >On Mon, Jun 27, 2011 at 4:38 AM, Steve Loughran <[email protected]> wrote: >> On 26/06/11 20:23, Scott Carey wrote: >>> >>> >>> On 6/23/11 5:49 AM, "Steve Loughran"<[email protected]> wrote: >>> >> >>>> what's your HW setup? #cores/server, #servers, underlying OS? >>> >>> CentOS 5.6. >>> 4 cores / 8 threads a server (Nehalem generation Intel processor). >> >> >> that should be enough to find problems. I've just moved up to a 6-core >>12 >> thread desktop and that found problems on some non-Hadoop code, which >>shows >> that the more threads you have, and the faster the machines are, the >>more >> your race conditions show up. With Hadoop the fact that you can have >>10-1000 >> servers means that in a large cluster the probability of that race >>condition >> showing up scales well. >> >>> Also run a smaller cluster with 2x quad core Core 2 generation Xeons. >>> >>> Off topic: >>> The single proc Nehalem is faster than the dual core 2's for most use >>> cases -- and much lower power. Looking forward to single proc 4 or 6 >>>core >>> Sandy Bridge based systems for the next expansion -- testing 4 core vs >>>4 >>> core has these 30% faster than the Nehalem generation systems in CPU >>>bound >>> tasks and lower power. Intel prices single socket Xeons so much lower >>> than the Dual socket ones that the best value for us is to get more >>>single >>> socket servers rather than fewer dual socket ones (with similar >>>processor >>> to hard drive ratio). >> >> Yes, in a large cluster the price of filling the second socket can >>compare >> to a lot of storage, and TB of storage is more tangible. I guess it >>depends >> on your application. >> >> Regarding Sandy Bridge, I've no experience of those, but I worry that 10 >> Gbps is still bleeding edge, and shouldn't be needed for code with good >> locality anyway; it is probably more cost effective to stay at >>1Gbps/server, >> though the issue there is the #of HDD/s server generates lots of >>replication >> traffic when a single server fails... >>
