Re: Why In-memory Mapoutput is necessary in ReduceCopier

Ling Kun Mon, 11 Mar 2013 18:43:44 -0700

Dear Ravi and all,

   Thanks very much for your kindly reply.
   I am currently concern about whether it is possible to eliminate the
HTTP GET method using some other ways. And currently have not got a better
idea.


   Thanks agin.

yours,
Ling Kun


On Tue, Mar 12, 2013 at 12:58 AM, Ravi Prakash <ravi...@ymail.com> wrote:

> Hi Ling,
>
> Yes! It is because of performance concerns. We want to keep and merge map
> outputs in memory as much as we can. The amount of memory reserved for this
> purpose is configurable. Obviously storing fetched map outputs on disk,
> then reading them back from disk to merge them and then write out back to
> disk, is a lot more expensive than if it were done in memory.
>
> Please let us know if you find there was an opportunity to keep the map
> output in memory but we did not, and instead shuffled to disk.
>
> Thanks
> Ravi
>
>
>
>
> ________________________________
>  From: Ling Kun <lkun.e...@gmail.com>
> To: mapreduce-dev@hadoop.apache.org
> Sent: Monday, March 11, 2013 5:27 AM
> Subject: Why In-memory Mapoutput is necessary in ReduceCopier
>
> Dear all,
>
>      I am focusing on the Mapoutput copier implementation. This part of
> code will try to get mapoutputs, and merge them into a file that can feed
> to reduce functions. I have the following questions.
>
> 1. All the local file mapoutput data will be merged together by the
> LocalFSMerge, and the in-memory mapout will be merged by
> InMemFSMergeThread. For the InMemFSMergeThread, there is also a writer
> object   which write the result to outputPath ( ReduceTask.java Line 2843).
> It seems after merging, in-memory mapoutput and local file mapoutput data
> will all be stored in local file system. Why not just using the local file
> for all mapoutput data.
>
> 2. After using http to get  some fragment of a map output file, some of the
> mapoutput data will be selected and keep in memory, while others are
> directly write to local disk of reducers. Which mapoutput wil be kept in
> memory is determined in MapOutputCopier.getMapOutput(), this method will
> call ramManager.canFitInMemory().  why not store all the data to disk?
>
> 3. According to the comment, Hadoop will put a file in memory if it meets:
> a, the size of the (decompressed) file should be less than 25% of the total
> inmem fs; b, there is space available in the inmem fs. Why ? Is it because
> of the performance?
>
>
>
> Thanks
>
> yours,
> Ling Kun
>
> --
> http://www.lingcc.com
>



-- 
http://www.lingcc.com

Re: Why In-memory Mapoutput is necessary in ReduceCopier

Reply via email to