Hello, In hdfs we have set block size - 40bytes . Input Data set is as below terminated with line feed.
data1 (5*8=40 bytes) data2 ...... ....... data10 But still we see only 2 map tasks spawned, should have been atleast 10 map tasks. Each mapper performs complex mathematical computation. Not sure how works internally. Line feed does not work. Even with below settings map tasks never goes beyound 2, any way to make this spawn 10 tasks. Basically it should look like compute grid - computation in parallel. <property> <name>io.bytes.per.checksum</name> <value>30</value> <description>The number of bytes per checksum. Must not be larger than io.file.buffer.size.</description> </property> <property> <name>dfs.block.size</name> <value>30</value> <description>The default block size for new files.</description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>10</value> <description>The maximum number of map tasks that will be run simultaneously by a task tracker. </description> </property> single node with high configuration -> 8 cpus and 8gb memory. Hence taking an example of 10 data items with line feeds. We want to utilize full power of machine - hence want at least 10 map tasks - each task needs to perform highly complex mathematical simulation. At present it looks like file data is the only way to specify number of map tasks via splitsize (in bytes) - but I prefer some criteria like line feed or whatever. How do we get 10 map tasks from above configuration - pls help. thanks -- View this message in context: http://old.nabble.com/increase-number-of-map-tasks-tp33107775p33107775.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
