Thanks for the clarifications!

________________________________
 From: Arun C Murthy <a...@hortonworks.com>
To: mapreduce-user@hadoop.apache.org 
Sent: Friday, January 6, 2012 10:45 AM
Subject: Re: Yarn related questions:
 

Responses inline:


On Jan 6, 2012, at 9:34 AM, Ann Pal wrote:

Thanks for your reply. Some additional questions:
>[1] How does the application master determine the size (memory requirement) of 
>the container  ? Can the container viewed as a JVM with CPU, memory?
Pretty much. It's related to the size of the JVM or any Unix process you want 
to run.


[2] The document, mentions a concept of fungibility of resources across 
servers. An allocated container of 2 GB of RAM for a reducer could be across 
two servers of 1GB each.  If so a task is split across 2 servers? Not sure how 
that works.
It means 'fungibility' across map and reduce tasks i.e. there is no more fixed 
map/reduce slots. A container can't be split across servers.


[3] The application master corresponds to Job Tracker for a given job, and Node 
Manager corresponds to task tracker  in  pre 0.23 hadoop. Is this assumption 
correct?
Pretty much. Except that the AM doesn't do any resource mgmt done by the JT, 
that's done by the ResourceManager.


[4] For data to be transferred from map->reduce node, is it the reduce node 
"node manager" who periodically polls the application master, and subsequently 
pulls map data from the completed map nodes?
No, the reduce task itself fetches map outputs.

The reduce tasks polls AM to get information about 'where' map outputs are 
available.

hth,
Arun

Reply via email to