Thanks for the quick responses. I raised the 2 parameters to 14 (figuring there might be other apps running - like Zookeeper - that might want some cores of their own). This has made a qualitative difference - the System Monitor now shows much higher squiggly lines indicating better distribution of the job to the various cores. However, the quantitative difference is insignificant - my job runs about 4% faster. I hope I don't have to migrate to use the C++ API from Java.
Alan -----Original Message----- From: Mohamed Riadh Trad [mailto:[email protected]] Sent: Wednesday, September 15, 2010 10:24 AM To: [email protected]; [email protected] Subject: EXTERNAL:Re: Making optimum use of cores Hi Christopher, I ve been Working @Sungard(Global Trading), I left 2 yeas ago... Hope you enjoy working in there... When it comes to performance, you should rather use the C++ API. By fixing the maps slots per node to Virtual Cpus number per Node, u can fully parallelize jobs.. and use 16000% of the Nehalem CPU. Regards, Le 15 sept. 2010 à 16:00, <[email protected]> <[email protected]> a écrit : > It seems likely that you are only running one (single-threaded) map or reduce > operation per worker node. Do you know whether you are in fact running > multiple operations? > > This also sounds like it may be a manifestation of a question that I have > seen a lot on the mailing lists lately, which is that people do not know how > to increase the number of task slots in their tasktracker configuration. > This setting is normally controlled via the setting > mapred.tasktracker.{map|reduce}.tasks.maximum in mapred-site.xml. The > default of 2 each is probably too low for your servers. > > > ----- Original Message ----- > From: Ratner, Alan S (IS) <[email protected]> > To: [email protected] <[email protected]> > Sent: Wed Sep 15 09:47:47 2010 > Subject: Making optimum use of cores > > I'm running Hadoop 0.20.2 on a cluster of servers running Ubuntu 10.4. > Each server has 2 quad-core Nehalem CPUs for a total of 8 physical cores > running as 16 virtual cores. Ubuntu's System Monitor displays 16 > squiggly lines showing usage of the 16 virtual cores. We only seem to > be making use of one of the 16 virtual cores on any slave node and even > on the master node only one virtual core is significantly busy at a > time. Is there a way to make better use of the cores? Presumably I > could run Hadoop in a VM assigned to each virtual core but I would think > there must be a more elegant solution. > > Alan Ratner > >
