Jonathan Seidman wrote:
We've created an implementation of FileSystem which allows us to use Sector (http://sector.sourceforge.net/) as the backing store for Hadoop. This implementation is functionally complete, and we can now run Hadoop MapReduce jobs against data stored in Sector.
Please consider contributing this to Hadoop.
We're now looking at how to optimize this interface, since the performance suffers considerably compared to MR processing run against HDFS.
Have you tried setting mapred.min.split.size to a large value, so that files are not generally split? Alternately, you might override FileInputFormat#computeSplitSize.
Doug