Carp, IMHO, .20.x has it. fs.inmemory.size.mb is the reduce-side equivalent of io.sort.mb. In the reducer tasks, intermediate map output is collected into a buffer (who size is governed by this parameter's value), and data is flushed into files as (partially) sorted KVs.
These files will be re-merged if we end up with more than io.sort.factor number of files, else KVs will be served out of these files to the reduce function directly. I don't know where in the code it is though, sorry. cheers, Sriguru >-----Original Message----- >From: Yu Li [mailto:[email protected]] >Sent: Thursday, July 01, 2010 1:12 PM >To: [email protected] >Subject: In which configuration file to configure the >"fs.inmemory.size.mb" parameter? > >Hi all, > >I looked through the "Cluster Setup" guide under link >http://hadoop.apache.org/common/docs/r0.20.1/cluster_setup.html and >found there's a "fs.inmemory.size.mb" parameter for specifying memory >allocated for the in-memory file-system used to merge map-outputs at >the reduces, and this parameter is set in the "core-site.xml". But >when I checked the "core-default.xml" under path >"$HADOOP_HOME/src/core/", I didn't find the parameter at all, nor >could I find the parameter through JTUI after lauching jobs. >Does anybody know about this parameter? Has it been removed from >release 0.20.X? If it hasn't been removed, how could I set the >parameter besides using the -D option? Thanks in advance. > >Best Regards, >Carp
