Re: Optimizing Hadoop MR with File Based File Systems

Jonathan Seidman Wed, 06 May 2009 11:53:00 -0700

Thanks, Doug. We'll take a look at modifying this parameter.

And yes, we'd like to contribute this once we have the performance at an
acceptable level.


Thanks.

Jonathan

On Wed, May 6, 2009 at 12:55 PM, Doug Cutting <cutt...@apache.org> wrote:

> Jonathan Seidman wrote:
>
>> We've created an implementation of FileSystem which allows us to use
>> Sector
>> (http://sector.sourceforge.net/) as the backing store for Hadoop. This
>> implementation is functionally complete, and we can now run Hadoop
>> MapReduce
>> jobs against data  stored in Sector.
>>
>
> Please consider contributing this to Hadoop.
>
>  We're now looking at how to optimize
>> this interface, since the performance suffers considerably compared to MR
>> processing run against HDFS.
>>
>
> Have you tried setting mapred.min.split.size to a large value, so that
> files are not generally split?  Alternately, you might override
> FileInputFormat#computeSplitSize.
>
> Doug
>



-- 
Jonathan Seidman
Open Data Group

Re: Optimizing Hadoop MR with File Based File Systems

Reply via email to