Re: parallel mapping on single server
Hi, I have a question about the strategy described by Jonman Chu: Hadoop will try to split the file according to how it is split up in the HDFS use wordcount as example. suppose hadoop is a word in input file. and block 1 ends with had, block 2 starts with oop, how to handle this case? Thanks for your reply 在 2008-7-11,上午5:27,Joman Chu 写道: Hadoop will try to split the file according to how it is split up in the HDFS. For example, if an input file has three blocks with a replication factor of two, there are six total blocks. Say there are six machines, each with a single block. Block 1 is on machines 1 and 2, block 2 is on 3 and 4, and block 3 is on 5 and 6. Hadoop will make three Map tasks. Each task is assigned to a machine and it will process the block that is locally on that machine. If it can't do this, then blocks are transferred among the rack and then to other machines in the cluster but further away. Joman Chu AIM: ARcanUSNUMquam IRC: irc.liquid-silver.net On Thu, Jul 10, 2008 at 10:40 AM, hong [EMAIL PROTECTED] wrote: Hi Follows Cao Haijun's reply: Suppose we have set 8 map tasks. How does each map know which part of input file it should process? 在 2008-7-10,上午2:33,Haijun Cao 写道: Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machine) at the same time: property namemapred.tasktracker.map.tasks.maximum/name value8/value descriptionThe maximum number of map tasks that will be run simultaneously by a task tracker. /description /property -Original Message- From: Deepak Diwakar [mailto:[EMAIL PROTECTED] Sent: Monday, July 07, 2008 1:29 AM To: core-user@hadoop.apache.org Subject: parallel mapping on single server Hi, I am pretty naive to hadoop. I ran a modification of wordcount on almost a TB data on single server, but found that it takes too much time. Actually i found that at a time only one core is utilized even though my server is of 8 cores. I read that hadoop speeds up computation in DFS mode.But how to make full utilization of a single server with multicore processors? Is there in pseudo dfs mode in hadoop? What are the changes required in config files .Please let me know in detail. Is there anything to do with hadoop-site.xml and mapred-default.xml? Thanks in advance. -- - Deepak Diwakar, Associate Software Eng., Pubmatic, pune Contact: +919960930405
Re: parallel mapping on single server
Hi Follows Cao Haijun's reply: Suppose we have set 8 map tasks. How does each map know which part of input file it should process? 在 2008-7-10,上午2:33,Haijun Cao 写道: Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machine) at the same time: property namemapred.tasktracker.map.tasks.maximum/name value8/value descriptionThe maximum number of map tasks that will be run simultaneously by a task tracker. /description /property -Original Message- From: Deepak Diwakar [mailto:[EMAIL PROTECTED] Sent: Monday, July 07, 2008 1:29 AM To: core-user@hadoop.apache.org Subject: parallel mapping on single server Hi, I am pretty naive to hadoop. I ran a modification of wordcount on almost a TB data on single server, but found that it takes too much time. Actually i found that at a time only one core is utilized even though my server is of 8 cores. I read that hadoop speeds up computation in DFS mode.But how to make full utilization of a single server with multicore processors? Is there in pseudo dfs mode in hadoop? What are the changes required in config files .Please let me know in detail. Is there anything to do with hadoop-site.xml and mapred-default.xml? Thanks in advance. -- - Deepak Diwakar, Associate Software Eng., Pubmatic, pune Contact: +919960930405
RE: parallel mapping on single server
Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machine) at the same time: property namemapred.tasktracker.map.tasks.maximum/name value8/value descriptionThe maximum number of map tasks that will be run simultaneously by a task tracker. /description /property -Original Message- From: Deepak Diwakar [mailto:[EMAIL PROTECTED] Sent: Monday, July 07, 2008 1:29 AM To: core-user@hadoop.apache.org Subject: parallel mapping on single server Hi, I am pretty naive to hadoop. I ran a modification of wordcount on almost a TB data on single server, but found that it takes too much time. Actually i found that at a time only one core is utilized even though my server is of 8 cores. I read that hadoop speeds up computation in DFS mode.But how to make full utilization of a single server with multicore processors? Is there in pseudo dfs mode in hadoop? What are the changes required in config files .Please let me know in detail. Is there anything to do with hadoop-site.xml and mapred-default.xml? Thanks in advance. -- - Deepak Diwakar, Associate Software Eng., Pubmatic, pune Contact: +919960930405