On the subject of gige vs 10-gige, I think that we will very shortly be seeing interest in 10gig, since gige is only 120MB/sec - 1 hard drive of streaming data. Nodes with 4+ disks are throttled by the network. On a small cluster (20 nodes), the replication traffic can choke a cluster to death. The only way to fix quickly it is to bring that node back up. Perhaps the HortonWorks guys can work on that.
-ryan On Mon, Jun 27, 2011 at 4:38 AM, Steve Loughran <[email protected]> wrote: > On 26/06/11 20:23, Scott Carey wrote: >> >> >> On 6/23/11 5:49 AM, "Steve Loughran"<[email protected]> wrote: >> > >>> what's your HW setup? #cores/server, #servers, underlying OS? >> >> CentOS 5.6. >> 4 cores / 8 threads a server (Nehalem generation Intel processor). > > > that should be enough to find problems. I've just moved up to a 6-core 12 > thread desktop and that found problems on some non-Hadoop code, which shows > that the more threads you have, and the faster the machines are, the more > your race conditions show up. With Hadoop the fact that you can have 10-1000 > servers means that in a large cluster the probability of that race condition > showing up scales well. > >> Also run a smaller cluster with 2x quad core Core 2 generation Xeons. >> >> Off topic: >> The single proc Nehalem is faster than the dual core 2's for most use >> cases -- and much lower power. Looking forward to single proc 4 or 6 core >> Sandy Bridge based systems for the next expansion -- testing 4 core vs 4 >> core has these 30% faster than the Nehalem generation systems in CPU bound >> tasks and lower power. Intel prices single socket Xeons so much lower >> than the Dual socket ones that the best value for us is to get more single >> socket servers rather than fewer dual socket ones (with similar processor >> to hard drive ratio). > > Yes, in a large cluster the price of filling the second socket can compare > to a lot of storage, and TB of storage is more tangible. I guess it depends > on your application. > > Regarding Sandy Bridge, I've no experience of those, but I worry that 10 > Gbps is still bleeding edge, and shouldn't be needed for code with good > locality anyway; it is probably more cost effective to stay at 1Gbps/server, > though the issue there is the #of HDD/s server generates lots of replication > traffic when a single server fails... >
