The following are suitable for hadoop 0.20.2. 2011/5/25 Juwei Shi <shiju...@gmail.com>
> The input split size is detemined by map.min.split.size, dfs.block.size and > mapred.map.tasks. > > goalSize = totalSize / mapred.map.tasks > minSize = max {mapred.min.split.size, minSplitSize} > splitSize= max (minSize, min(goalSize, dfs.block.size)) > > minSplitSize is determined by each InputFormat such as > SequenceFileInputFormat. > > You may want to refer to FileInputFormat.java for more details. > > > 2011/5/25 Mapred Learn <mapred.le...@gmail.com> > >> Resending ====> >> >> >> > Hi, >> > I have few input splits that are few MB in size. >> > I want to submit 1 GB of input to every mapper. Does anyone know how can >> I do it ? >> > Currently each mapper gets one input split that results in many small >> map-output files. >> > >> > I tried setting -Dmapred.map.min.split.size=<number> , but still it does >> not take effect. >> > >> > Thanks, >> > -JJ >> > > > > -- > - Juwei Shi > -- - Juwei Shi (史巨伟)