The maximum number of tasks running at once per node is dictated by <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>6</value>
</property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>4</value> </property> I do not work with ec2 so I do not know if how to adjust it. Hive prints a message like this during the query. Number of reduce tasks not specified. Estimated from input data size: 8 It also prints some information on how to adjust this. You can set this to a fixed value set mapred.reduce.tasks=X or tell hive how much data each reducer should handle. See the verbose query output. map tasks are normally chosed by hadoop. You can go higher but you can not make hadoop go lower then it wants to. You can force the issue a bit by setting mapred.map.tasks=X On Fri, Feb 19, 2010 at 3:52 PM, Saurabh Nanda <[email protected]> wrote: > Hi, > > Is there any page/document that describes the methods/techniques used by > Hive to arrive at the optimum number of map tasks & optimum number of reduce > tasks? > > I'm running a 3-node Amazon EMR cluster, and Hive has determined that 34 map > & 2 reduce tasks are optimum. Out of the 34 map tasks only 4 are actively > running at any given instant. Any explanations why this exact number? > > Saurabh. > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com >
