[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866020#action_12866020
 ] 

Hemanth Yamijala commented on MAPREDUCE-1781:
---------------------------------------------

mapred.tasktracker.map.tasks.maximum is a startup configuration parameter and 
cannot be modified per job. Even in your first scenario (where it seemed to 
work), I am guessing that the system started running 1 map per node because of 
scheduling decisions and not because the tasktrackers were configured to run 
with only 1 task per node.

bq. Why is this happening and how can I make it work properly (i.e. be able to 
limit exactly how many mappers I can have at 1 time per node)?

Can  you provide some more details of why you want to limit a job to use only 
one mapper at a time on a node ?

> option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of 
> mappers is bigger than no of nodes - always spawns 2 mapers/node
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1781
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.2
>         Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
>            Reporter: Tudor Vlad
>
> Hello
> I am a new user of Hadoop and I have some trouble using Hadoop Streaming and 
> the "-D mapred.tasktracker.map.tasks.maximum" option. 
> I'm experimenting with an unmanaged application (C++) which I want to run 
> over several nodes in 2 scenarios
> 1) the number of maps (input splits) is equal to the number of nodes
> 2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...
> Initially, when running the tests in scenario 1 I would sometimes get 2 
> process/node on half the nodes. However I fixed this by adding the optin "-D 
> mapred.tasktracker.map.tasks.maximum=1", so everything works fine.
> In the case of scenario 2 (more maps than nodes) this directive no longer 
> works, always obtaining 2 processes/node. I tested the even with putting 
> maximum=5 and I still get 2 processes/node.
> The entire command I use is:
> /usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F 
> |\t-ContxtSwitch:\t%w" \
>  /opt/hadoop/bin/hadoop jar 
> /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
>  -D mapred.tasktracker.map.tasks.maximum=1 \
>  -D mapred.map.tasks=30 \
>  -D mapred.reduce.tasks=0 \
>  -D io.file.buffer.size=5242880 \
>  -libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
>  -input input/test \
>  -output out1 \
>  -mapper "/opt/jobdata/script_1k" \
>  -inputformat "me.MyInputFormat"
> Why is this happening and how can I make it work properly (i.e. be able to 
> limit exactly how many mappers I can have at 1 time per node)?
> Thank you in advance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to