(Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully the list will correct any details I get wrong)
In Hadoop 1: the mapper would put the file in a well-known location on the machine (encoded by user, job ID and map ID) then TaskTracker would serve it over HTTP to the reducer when it requests it (authenticated using a secret token in the job). Look in the MapOutputServlet class in TaskTracker for most of the related code. In Yarn: similar thing, except that now it's a NodeManager plug-in (auxiliary service) that serves the map output since there's no TaskTracker anymore. Look at the ShuffleHandler class in hadoop-mapreduce-client-shuffle project. I see comments in the code indicating that this will be changed from a NodeManager plug-in in the future, but I don't know much about that. Hope it helps, Mostafa On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <[email protected]> wrote: > Hi folks, > > Can anyone explain to me briefly how the each mapper reports the > location of the intermediate kv partion files to the master? And, if > possible, where in the code I might find where that happens? > > Thanks for any help, > Randy >
