Hi Kumar, Thanks for your reply, there are tow fundamental questions: 1. What are the advantages in running multiple data nodes on a single computer, because on data node instance can read multiple data directories too? 2. Are there advantages to run multiple task trackers, I think running multiple task trackers is to use more CPU cores, can larger number such as 4 or 8 for mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum with only one task tracker achieve the same purpose?
Regards, Xiaobo Gu On Mon, Jul 4, 2011 at 12:34 PM, Kumar Kandasami <kumaravel.kandas...@gmail.com> wrote: > Xiaobo Gu: > > Being you are trying an approach that is not something commonly done, > I've recommended someways that I'll take to approach this problem - > > Experiment with different configuration settings, and see the performance of > the cluster using a Terabyte sort job submitted to the cluster. So you have > a bench mark to see whether the cluster performance has improved by the > change. > > I am not sure whether you need multiple Task tracker being that you are > running multiple data nodes on the same machine. > > mapred.tasktracker.map.tasks.maximum attribute is going to open as many > separate JVMs to run the map tasks. > memory allocated for these each task can be controlled by the > mapred.child.java.opts. > > So say if you have 32 processors - then generally you can run 64 JVM > process, allocating one per the hadoop demons you have > > SecondaryNameNode - 1 > NameNode - 1 > JobTracker - 1 > DataNode - 8 > TaskTracker - 1 > > > you have 52 maximum JVM process you could allocate between Map & Reduce > tasks. Approximately you could allocate a max of 3-4 GB per map/reduce task > (if needed). > > Concern: more number of map/reduce tasks might cause bottle neck to the disk > access, it might depend on the HDD configuration too. > > Hope these comments help you succeed further. Please keep me posted with > your benchmark results and configuration. > > > > > Kumar _/|\_ > www.saisk.com > ku...@saisk.com > "making a profound difference with knowledge and creativity..." > > > On Sun, Jul 3, 2011 at 10:30 PM, Xiaobo Gu <guxiaobo1...@gmail.com> wrote: >> >> Hi Kumar, >> >> Thanks for your reply, can you have a look at this please. >> >> Regards, >> >> Xiaobo Gu >> >> >> >> >> ---------- Forwarded message ---------- >> From: Xiaobo Gu <guxiaobo1...@gmail.com> >> Date: Sun, Jul 3, 2011 at 10:07 PM >> Subject: Re: FW: How to run multiple data nodes and multiple task >> trackers on single server. >> To: mapreduce-user@hadoop.apache.org >> >> >> Hi Harsh >> I have successfully running 2 data nodes in a single vitual machine, >> and we will depoly 4 or 8 data nodes on our big SMP server, which has >> 32 CPU cores and 256G RAM,in order to take full advantage of all the >> resources, do we need to configure more task trackers too, or can we >> set mapred.tasktracker.map.tasks.maximum and >> mapred.tasktracker.reduce.tasks.maximum to a larger number such as 8 >> or 16 to achieve the same purpose? >> >> We have seen this >> http://developer.yahoo.com/events/hadoopsummit2011/agenda.html#21, but >> have not get any more details, we think multiple data node >> configuration on big SMP servers is a good point to start with. >> >> Regards, >> >> Xiaobo Gu >> >> On Sun, Jul 3, 2011 at 9:56 PM, Harsh J <ha...@cloudera.com> wrote: >> > On Sun, Jul 3, 2011 at 9:41 AM, XiaoboGu <guxiaobo1...@gmail.com> wrote: >> >>> Hi, >> >>> >> >>> Do we have to run multiple task trackers when running multiple data >> >>> nodes on a single >> >>> computer? >> >>> >> >>> Regards, >> >>> >> >>> Xiaobo Gu >> >>> >> > >> > Do we _have_ to? --> No, its a matter of your choice if you want >> > MapReduce daemons running along. They are not coupled. >> > >> > Regd. your original question, what's the string of "$DN_CONF_OPTS" being >> > passed? >> > >> > -- >> > Harsh J >> > > >