Dear Ravi and all, Thanks very much for your kindly reply. I am currently concern about whether it is possible to eliminate the HTTP GET method using some other ways. And currently have not got a better idea.
Thanks agin. yours, Ling Kun On Tue, Mar 12, 2013 at 12:58 AM, Ravi Prakash <ravi...@ymail.com> wrote: > Hi Ling, > > Yes! It is because of performance concerns. We want to keep and merge map > outputs in memory as much as we can. The amount of memory reserved for this > purpose is configurable. Obviously storing fetched map outputs on disk, > then reading them back from disk to merge them and then write out back to > disk, is a lot more expensive than if it were done in memory. > > Please let us know if you find there was an opportunity to keep the map > output in memory but we did not, and instead shuffled to disk. > > Thanks > Ravi > > > > > ________________________________ > From: Ling Kun <lkun.e...@gmail.com> > To: mapreduce-dev@hadoop.apache.org > Sent: Monday, March 11, 2013 5:27 AM > Subject: Why In-memory Mapoutput is necessary in ReduceCopier > > Dear all, > > I am focusing on the Mapoutput copier implementation. This part of > code will try to get mapoutputs, and merge them into a file that can feed > to reduce functions. I have the following questions. > > 1. All the local file mapoutput data will be merged together by the > LocalFSMerge, and the in-memory mapout will be merged by > InMemFSMergeThread. For the InMemFSMergeThread, there is also a writer > object which write the result to outputPath ( ReduceTask.java Line 2843). > It seems after merging, in-memory mapoutput and local file mapoutput data > will all be stored in local file system. Why not just using the local file > for all mapoutput data. > > 2. After using http to get some fragment of a map output file, some of the > mapoutput data will be selected and keep in memory, while others are > directly write to local disk of reducers. Which mapoutput wil be kept in > memory is determined in MapOutputCopier.getMapOutput(), this method will > call ramManager.canFitInMemory(). why not store all the data to disk? > > 3. According to the comment, Hadoop will put a file in memory if it meets: > a, the size of the (decompressed) file should be less than 25% of the total > inmem fs; b, there is space available in the inmem fs. Why ? Is it because > of the performance? > > > > Thanks > > yours, > Ling Kun > > -- > http://www.lingcc.com > -- http://www.lingcc.com