you can specify in your config the number of tasks per node. don't think the second thing you mention is possible.
hadoop + ec2 has worked very well for me. good luck. derek On 9/8/07, Devajyoti Sarkar <[EMAIL PROTECTED]> wrote: > Hi All, > > I am new to hadoop and I seem to be having a problem setting the number of > map tasks per node. I have an application that needs to load a significant > amount of data (about 1 GB) in memory to use in mapping data read from > files. I store this in a singleton and access it from my mapper. In order to > do this, I need to have exactly one map task run on a node at anyone time or > the memory requirements will far exceed my RAM. I am generating my own > Splits using an InputFormat class. This gives me roughly 10 splits per node > and I need each corresponding map task in a sequential fashion in the same > child jvm so that each map run does not have to reinitialize the data. > > I have tried the following in a single node configuration and 2 splits: > - setting setNumMapTasks in the JobConf to 1 but hadoop seems to create 2 > map tasks > - setting mapred.tasktracker.tasks.maximum property 1 - same result 2 map > tasks > - setting mapred.map.tasks property to 1 - same result 2 map tasks > > I have yet to try it in a multiple node configuration. My target will using > 20 AWS EC2 instances. > > Can you please let me know what I should be doing or looking at to make sure > that I have maximum 1 map task per node. Also, how can I have multiple > splits being mapped within the same child jvm by different map tasks in > sequence? > > Thanks in advance, > Dev >
