Trade off between hdfs efficiency and data locality.
On Tue, May 5, 2009 at 9:37 AM, Arun C Murthy <a...@yahoo-inc.com> wrote: > > On May 5, 2009, at 4:47 AM, Christian Ulrik Søttrup wrote: > > Hi all, >> >> I have a job that creates very big local files so i need to split it to as >> many mappers as possible. Now the DFS block size I'm >> using means that this job is only split to 3 mappers. I don't want to >> change the hdfs wide block size because it works for my other jobs. >> >> > I would rather keep the big files on HDFS and use -Dmapred.min.split.size > to get more maps to process your data.... > > http://hadoop.apache.org/core/docs/r0.20.0/mapred_tutorial.html#Job+Input > > Arun > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals