subject:"RE\: parallel mapping on single server"

Re: parallel mapping on single server

2008-07-12 Thread hong


Hi,

I have a question about the strategy described by Jonman Chu:
Hadoop will try to split the file according to how it is split up in  
the HDFS


use wordcount as example. suppose  hadoop is a word in input file.  
and block 1 ends with had, block 2 starts with oop,  how to  
handle this case?


Thanks for your reply

在 2008-7-11，上午5:27，Joman Chu 写道：


Hadoop will try to split the file according to how it is split up in
the HDFS. For example, if an input file has three blocks with a
replication factor of two, there are six total blocks. Say there are
six machines, each with a single block. Block 1 is on machines 1 and
2, block 2 is on 3 and 4, and block 3 is on 5 and 6. Hadoop will make
three Map tasks. Each task is assigned to a machine and it will
process the block that is locally on that machine. If it can't do
this, then blocks are transferred among the rack and then to other
machines in the cluster but further away.

Joman Chu
AIM: ARcanUSNUMquam
IRC: irc.liquid-silver.net


On Thu, Jul 10, 2008 at 10:40 AM, hong [EMAIL PROTECTED] wrote:

Hi

Follows Cao Haijun's reply:

Suppose we have set 8 map tasks. How does each map know which part  
of input

file it should process?

在 2008-7-10，上午2:33，Haijun Cao 写道：

Set number of map slots per tasktracker to 8 in order to run 8  
map tasks
on one machine (assuming one tasktracker per machine) at the same  
time:



property
 namemapred.tasktracker.map.tasks.maximum/name
 value8/value
 descriptionThe maximum number of map tasks that will be run
 simultaneously by a task tracker.
 /description
/property


-Original Message-
From: Deepak Diwakar [mailto:[EMAIL PROTECTED]
Sent: Monday, July 07, 2008 1:29 AM
To: core-user@hadoop.apache.org
Subject: parallel mapping on single server

Hi,

I am pretty naive to hadoop. I ran a modification of wordcount  on
almost a
TB data on single server, but found that it takes too much time.
Actually i
found that at a time only one core is utilized even though my  
server is

of 8
cores.  I read that hadoop speeds up computation in DFS mode.But  
how to

make
full utilization of a single server with multicore processors?   
Is there

in
pseudo dfs mode in hadoop? What are the changes required in  
config files

.Please let me know in detail. Is there anything to do with
hadoop-site.xml
and mapred-default.xml?

Thanks in advance.
--
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405

Re: parallel mapping on single server

2008-07-10 Thread hong


Hi

Follows Cao Haijun's reply:

Suppose we have set 8 map tasks. How does each map know which part of  
input file it should process?


在 2008-7-10，上午2:33，Haijun Cao 写道：

Set number of map slots per tasktracker to 8 in order to run 8 map  
tasks
on one machine (assuming one tasktracker per machine) at the same  
time:



property
  namemapred.tasktracker.map.tasks.maximum/name
  value8/value
  descriptionThe maximum number of map tasks that will be run
  simultaneously by a task tracker.
  /description
/property


-Original Message-
From: Deepak Diwakar [mailto:[EMAIL PROTECTED]
Sent: Monday, July 07, 2008 1:29 AM
To: core-user@hadoop.apache.org
Subject: parallel mapping on single server

Hi,

I am pretty naive to hadoop. I ran a modification of wordcount  on
almost a
TB data on single server, but found that it takes too much time.
Actually i
found that at a time only one core is utilized even though my  
server is

of 8
cores.  I read that hadoop speeds up computation in DFS mode.But  
how to

make
full utilization of a single server with multicore processors?  Is  
there

in
pseudo dfs mode in hadoop? What are the changes required in config  
files

.Please let me know in detail. Is there anything to do with
hadoop-site.xml
and mapred-default.xml?

Thanks in advance.
--
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405

RE: parallel mapping on single server

2008-07-09 Thread Haijun Cao

Set number of map slots per tasktracker to 8 in order to run 8 map tasks
on one machine (assuming one tasktracker per machine) at the same time:


property
  namemapred.tasktracker.map.tasks.maximum/name
  value8/value
  descriptionThe maximum number of map tasks that will be run
  simultaneously by a task tracker.
  /description
/property


-Original Message-
From: Deepak Diwakar [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 07, 2008 1:29 AM
To: core-user@hadoop.apache.org
Subject: parallel mapping on single server

Hi,

I am pretty naive to hadoop. I ran a modification of wordcount  on
almost a
TB data on single server, but found that it takes too much time.
Actually i
found that at a time only one core is utilized even though my server is
of 8
cores.  I read that hadoop speeds up computation in DFS mode.But how to
make
full utilization of a single server with multicore processors?  Is there
in
pseudo dfs mode in hadoop? What are the changes required in config files
.Please let me know in detail. Is there anything to do with
hadoop-site.xml
and mapred-default.xml?

Thanks in advance.
-- 
- Deepak Diwakar,
Associate Software Eng.,
Pubmatic, pune
Contact: +919960930405

Re: parallel mapping on single server

Re: parallel mapping on single server

RE: parallel mapping on single server

3 matches

Site Navigation

Mail list logo

Footer information