Re: Optimizing Hadoop MR with File Based File Systems

Doug Cutting Wed, 06 May 2009 10:56:23 -0700

Jonathan Seidman wrote:

We've created an implementation of FileSystem which allows us to use Sector
(http://sector.sourceforge.net/) as the backing store for Hadoop. This
implementation is functionally complete, and we can now run Hadoop MapReduce
jobs against data  stored in Sector.


Please consider contributing this to Hadoop.

We're now looking at how to optimize
this interface, since the performance suffers considerably compared to MR
processing run against HDFS.

Have you tried setting mapred.min.split.size to a large value, so thatfiles are not generally split? Alternately, you might overrideFileInputFormat#computeSplitSize.


Doug

Re: Optimizing Hadoop MR with File Based File Systems

Reply via email to