you can specify in your config the number of tasks per node. don't
think the second thing you mention is possible.

hadoop + ec2 has worked very well for me. good luck.

derek


On 9/8/07, Devajyoti Sarkar <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I am new to hadoop and I seem to be having a problem setting the number of
> map tasks per node. I have an application that needs to load a significant
> amount of data (about 1 GB) in memory to use in mapping data read from
> files. I store this in a singleton and access it from my mapper. In order to
> do this, I need to have exactly one map task run on a node at anyone time or
> the memory requirements will far exceed my RAM. I am generating my own
> Splits using an InputFormat class. This gives me roughly 10 splits per node
> and I need each corresponding map task in a sequential fashion in the same
> child jvm so that each map run does not have to reinitialize the data.
>
> I have tried the following in a single node configuration and 2 splits:
> - setting  setNumMapTasks in the JobConf to 1 but hadoop seems to create 2
> map tasks
> - setting mapred.tasktracker.tasks.maximum property 1  - same result 2 map
> tasks
> - setting mapred.map.tasks property to 1 - same result 2 map tasks
>
> I have yet to try it in a multiple node configuration. My target will using
> 20 AWS EC2 instances.
>
> Can you please let me know what I should be doing or looking at to make sure
> that I have maximum 1 map task per node. Also, how can I have multiple
> splits being mapped within the same child jvm by different map tasks in
> sequence?
>
> Thanks in advance,
> Dev
>

Reply via email to