Thanks, Doug. We'll take a look at modifying this parameter. And yes, we'd like to contribute this once we have the performance at an acceptable level.
Thanks. Jonathan On Wed, May 6, 2009 at 12:55 PM, Doug Cutting <cutt...@apache.org> wrote: > Jonathan Seidman wrote: > >> We've created an implementation of FileSystem which allows us to use >> Sector >> (http://sector.sourceforge.net/) as the backing store for Hadoop. This >> implementation is functionally complete, and we can now run Hadoop >> MapReduce >> jobs against data stored in Sector. >> > > Please consider contributing this to Hadoop. > > We're now looking at how to optimize >> this interface, since the performance suffers considerably compared to MR >> processing run against HDFS. >> > > Have you tried setting mapred.min.split.size to a large value, so that > files are not generally split? Alternately, you might override > FileInputFormat#computeSplitSize. > > Doug > -- Jonathan Seidman Open Data Group