As I understand, mapred.min.split.size defines the minimum size of a split. In the case below:
(1) HDFS block size = 32MB, mapred.min.split.size=64MB (mapred.min.split.size can be only set to larger than HDFS block size) when I run mapreduce, it means that a map will run one input split of 64MB of size, but in reality, it contains 2 HDFS blocks. Is this right? On Fri, Mar 18, 2011 at 8:12 PM, Marcos Ortiz <mlor...@uci.cu> wrote: > El 3/18/2011 3:54 PM, Pedro Costa escribió: >> >> Hi >> >> What's the purpose of the parameter "mapred.min.split.size"? >> >> Thanks, >> > > There are many parameters that control the number of map tasks for a Job, > and mapred.min.split.size controls the minimun size of a split. Other > parameters are: > - mapreduce.map.tasks: The suggested number of map tasks > - dfs.block.size: the file system block size in bytes of the input file > > Regards > > -- > Marcos Luís Ortíz Valmaseda > Software Engineer > Universidad de las Ciencias Informáticas > Linux User # 418229 > > http://uncubanitolinuxero.blogspot.com > http://www.linkedin.com/in/marcosluis2186 > > -- Pedro