Re: Hadoop Java Versions

Ted Dunning Mon, 27 Jun 2011 17:13:09 -0700

Come to Srivas talk at the Summit.

On Mon, Jun 27, 2011 at 5:10 PM, Ryan Rawson <[email protected]> wrote:


> On the subject of gige vs 10-gige, I think that we will very shortly
> be seeing interest in 10gig, since gige is only 120MB/sec - 1 hard
> drive of streaming data.  Nodes with 4+ disks are throttled by the
> network.  On a small cluster (20 nodes), the replication traffic can
> choke a cluster to death.  The only way to fix quickly it is to bring
> that node back up.  Perhaps the HortonWorks guys can work on that.
>
> -ryan
>
> On Mon, Jun 27, 2011 at 4:38 AM, Steve Loughran <[email protected]> wrote:
> > On 26/06/11 20:23, Scott Carey wrote:
> >>
> >>
> >> On 6/23/11 5:49 AM, "Steve Loughran"<[email protected]>  wrote:
> >>
> >
> >>> what's your HW setup? #cores/server, #servers, underlying OS?
> >>
> >> CentOS 5.6.
> >> 4 cores / 8 threads a server (Nehalem generation Intel processor).
> >
> >
> > that should be enough to find problems. I've just moved up to a 6-core 12
> > thread desktop and that found problems on some non-Hadoop code, which
> shows
> > that the more threads you have, and the faster the machines are, the more
> > your race conditions show up. With Hadoop the fact that you can have
> 10-1000
> > servers means that in a large cluster the probability of that race
> condition
> > showing up scales well.
> >
> >> Also run a smaller cluster with 2x quad core Core 2 generation Xeons.
> >>
> >> Off topic:
> >> The single proc Nehalem is faster than the dual core 2's for most use
> >> cases -- and much lower power.  Looking forward to single proc 4 or 6
> core
> >> Sandy Bridge based systems for the next expansion -- testing 4 core vs 4
> >> core has these 30% faster than the Nehalem generation systems in CPU
> bound
> >> tasks and lower power.  Intel prices single socket Xeons so much lower
> >> than the Dual socket ones that the best value for us is to get more
> single
> >> socket servers rather than fewer dual socket ones (with similar
> processor
> >> to hard drive ratio).
> >
> > Yes, in a large cluster the price of filling the second socket can
> compare
> > to a lot of storage, and TB of storage is more tangible. I guess it
> depends
> > on your application.
> >
> > Regarding Sandy Bridge, I've no experience of those, but I worry that 10
> > Gbps is still bleeding edge, and shouldn't be needed for code with good
> > locality anyway; it is probably more cost effective to stay at
> 1Gbps/server,
> > though the issue there is the #of HDD/s server generates lots of
> replication
> > traffic when a single server fails...
> >
>

Re: Hadoop Java Versions

Reply via email to