Jonathan Seidman wrote:
We've created an implementation of FileSystem which allows us to use Sector
(http://sector.sourceforge.net/) as the backing store for Hadoop. This
implementation is functionally complete, and we can now run Hadoop MapReduce
jobs against data  stored in Sector.

Please consider contributing this to Hadoop.

We're now looking at how to optimize
this interface, since the performance suffers considerably compared to MR
processing run against HDFS.

Have you tried setting mapred.min.split.size to a large value, so that files are not generally split? Alternately, you might override FileInputFormat#computeSplitSize.

Doug

Reply via email to