Hi, Is there any page/document that describes the methods/techniques used by Hive to arrive at the optimum number of map tasks & optimum number of reduce tasks?
I'm running a 3-node Amazon EMR cluster, and Hive has determined that 34 map & 2 reduce tasks are optimum. Out of the 34 map tasks only 4 are actively running at any given instant. Any explanations why this exact number? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
