> > How much RAM do you have? > > > > A good rule of thumb is to use 1-1.5G for maps and 2G per reduce > > (vmem). Ensure your OS has at least 2G of memory. > > > > Thus, with 24G and dual quad cores you should be at 8-10m/2r. Scale > up > > if you have more memory. > > Would you say RAM was the main factor? We currently have 1G heap per > mapper. > We had heard multiples of 1 disk / 2 core / 4G were good with slightly > more slots for (mappers + reducers) than cores. Would you agree? > Can you speak to how we should use hyperthreading, can I treat them as > separate cores? (I know in virtualisation that the recommendation is > to disable it but for some other workloads you get 2x performance > improvement) > > > Tom
Tom, I can't speak for other virtualization vendors, but VMware does not recommend disabling HT. Do you have a source that says otherwise (so we can fix it)? The benefit from HT running on vSphere is pretty much the same as what you get from the native OS. I've never seen any workload on any platform that can get 2X from HT, but I've seen as high as 1.5X. I'm getting very good results running about one task per logical processor (2 per core). Recent virtualized Hadoop performance results are here: http://www.vmware.com/resources/techresources/10222 Jeff