Hi,

I have a question about the strategy described by Jonman Chu:
"Hadoop will try to split the file according to how it is split up in the HDFS"

use wordcount as example. suppose "hadoop" is a word in input file. and block 1 ends with "had", block 2 starts with "oop", how to handle this case?

Thanks for your reply

在 2008-7-11,上午5:27,Joman Chu 写道:

Hadoop will try to split the file according to how it is split up in
the HDFS. For example, if an input file has three blocks with a
replication factor of two, there are six total blocks. Say there are
six machines, each with a single block. Block 1 is on machines 1 and
2, block 2 is on 3 and 4, and block 3 is on 5 and 6. Hadoop will make
three Map tasks. Each task is assigned to a machine and it will
process the block that is locally on that machine. If it can't do
this, then blocks are transferred among the rack and then to other
machines in the cluster but further away.

Joman Chu
AIM: ARcanUSNUMquam
IRC: irc.liquid-silver.net


On Thu, Jul 10, 2008 at 10:40 AM, hong <[EMAIL PROTECTED]> wrote:
Hi

Follows Cao Haijun's reply:

Suppose we have set 8 map tasks. How does each map know which part of input
file it should process?

在 2008-7-10,上午2:33,Haijun Cao 写道:

Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machine) at the same time:


<property>
 <name>mapred.tasktracker.map.tasks.maximum</name>
 <value>8</value>
 <description>The maximum number of map tasks that will be run
 simultaneously by a task tracker.
 </description>
</property>


-----Original Message-----
From: Deepak Diwakar [mailto:[EMAIL PROTECTED]
Sent: Monday, July 07, 2008 1:29 AM
To: core-user@hadoop.apache.org
Subject: parallel mapping on single server

Hi,

I am pretty naive to hadoop. I ran a modification of wordcount  on
almost a
TB data on single server, but found that it takes too much time.
Actually i
found that at a time only one core is utilized even though my server is
of 8
cores. I read that hadoop speeds up computation in DFS mode.But how to
make
full utilization of a single server with multicore processors? Is there
in
pseudo dfs mode in hadoop? What are the changes required in config files
.Please let me know in detail. Is there anything to do with
hadoop-site.xml
and mapred-default.xml?

Thanks in advance.
--
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405






Reply via email to