Hey Praveen, I'm in the same boat as you re: getting started with the MR2 code. I have a couple of answers and a couple of followup questions for Arun et al. to keep in mind as they're writing a design doc.
On Wed, Jun 15, 2011 at 5:27 AM, Praveen Sripati <praveensrip...@gmail.com> wrote: > Hi, > > - How to specify that an ApplicationMaster use a particular version of the > MapReduce library dynamically? I don't totally grok the question-- doesn't the client-side code that configures the ApplicationMaster decide this? > > - How does the ApplicationManager pick a node to run the ApplicationMaster? > What resource considerations are taken if any while picking a particular > node to run the ApplicationMaster? There is an ApplicationsManager (note the extra 's') that is part of the functionality of the RM. See reference: http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/ > > - Who observes the ResourceManager/ApplicationMaster/NodeManager for > failures to be restarted later? From the blog entry it seems that the state > of the ResourceManager is stored in the ZooKeeper and the state of the > ApplicationManager is stored in the HDFS. So this is the classic problem of any such system-- who watches the watchmen? It seems like the client would be notified when an ApplicationManager failed by the ApplicationSManager (see above blog post again, it's actually a good blog post, it would be great to have a few more of them), the ResourceManager would know when a NodeManager failed, and it falls to an admin and/or an external monitoring system to detect ResourceManager failure and handle the restart. > > - Looks like the containers are based on Linux cgroups. So, is the MRv2 > limited only to the Linux boxes? Yeah, I bumped into this when I was doing a naive build + install on my Mac. Not that I see folks running alot of hadoop clusters on Macs, but it would be cool if the basic build/install just worked on every platform, even if it's just as simple as detecting the platform and skipping the build of the native container-executor stuff. (Note: I actually got the container-executor stuff to build by using the standard Mac tricks, but I'm not sure if it's worth checking in.) > > Hope the design document from Arun will make me ask less queries in this > forum :) > > Thanks, > Praveen > > On Wed, Jun 15, 2011 at 9:17 AM, Mahadev Konar <maha...@apache.org> wrote: > >> Praveen, >> In that case, if a just launched container is released, the NM will >> be notified via the RM that the container is not longer valid and the >> NM will go ahead and kill the container. >> >> >> On Tue, Jun 14, 2011 at 8:38 PM, Praveen Sripati >> <praveensrip...@gmail.com> wrote: >> > Mahadev, >> > >> > MapReduce ApplicationMaster might behave well, but what about custom >> > ApplicationMasters for other models. >> > >> >> Q) What happens if an ApplicationMaster asks a NM to launch a container >> > and >> >> then releases the container in the allocate call later? >> > >> >> A) The Application Master only releases the container once the container >> > is done. >> > >> > Thanks, >> > Praveen >> > >> > On Wed, Jun 15, 2011 at 8:59 AM, Mahadev Konar <maha...@apache.org> >> wrote: >> > >> >> Praveen, >> >> Answers in line: >> >> >> >> > >> >> > Q) What happens if an ApplicationMaster asks a NM to launch a >> container >> >> and >> >> > then releases the container in the allocate call later? >> >> >> >> The Application Master only releases the container once the container is >> >> done. >> >> >> >> > >> >> > Q) So, the NM watches the UNIX Process/Containers and sends the status >> to >> >> > the ApplicationManager. Later the ApplicationManager sends the status >> of >> >> the >> >> > containers in response to the allocate call to the ApplicationMaster. >> Why >> >> > should the ApplicationMaster be aware of the container status, since >> it's >> >> > already tracking the map/reduce tasks in the containers? >> >> >> >> Its just a way to notify the application master as soon as possible >> >> when the containers fail. >> >> This helps in speeding up the notification of failed containers else >> >> AM has to wait for discovering >> >> failures via timeouts. >> >> >> >> > >> >> > Q) Does the ApplicationMaster notify the NodeManager to exit the UNIX >> >> > Process when the map/reduce tasks in that particular container are >> >> > completed? Are the containers re-used? >> >> >> >> Yes it notifes the NM. >> >> >> >> Containers are not re used as of now. In future we do see the >> >> containers being re used but we'll need leases to do that. >> >> >> >> > >> >> > Q) The ApplicationManager asks the NodeManager to create a container >> and >> >> > also launch the map/reduce task in it. From then on the >> >> ApplicationManager >> >> > and Map/Reduce tasks interact directly without the NodeManager. Am I >> >> > correct? >> >> > >> >> I think you mean ApplicationMaster. Yes, the applicationmaster and >> >> map/reduce tasks talk directly >> >> without NM being involved. >> >> >> >> > Praveen >> >> > >> >> > On Wed, Jun 15, 2011 at 12:59 AM, Arun C Murthy <a...@yahoo-inc.com> >> >> wrote: >> >> > >> >> >> >> >> >> On Jun 14, 2011, at 6:31 PM, Praveen Sripati wrote: >> >> >> >> >> >> Hi, >> >> >>> >> >> >>> I have gone through MapReduce NextGen Blog entries and JIRA and have >> >> the >> >> >>> following queries >> >> >>> >> >> >>> There is a single API between the Scheduler and the >> ApplicationMaster: >> >> >>>>> >> >> >>>> >> >> >>> (List <Container> newContainers, List <ContainerStatus> >> >> >>>>> >> >> >>>> containerStatuses) allocate (List <ResourceRequest> ask, >> >> List<Container> >> >> >>> release) >> >> >>> >> >> >>> The AM ask for specific resources via a list of ResourceRequests >> (ask) >> >> >>>>> >> >> >>>> and releases unnecessary Containers which were allocated by the >> >> >>> Scheduler. >> >> >>> >> >> >>> The response contains a list of newly allocated Containers and the >> >> >>>>> >> >> >>>> statuses of application-specific Containers that completed since >> the >> >> >>> previous interaction between the AM and the RM. >> >> >>> >> >> >>> Q) If split-0 is is available in host1, host2 and host3, can >> >> >>> ApplicationMaster request a scheduler for a container on host1 or >> host2 >> >> or >> >> >>> host3? This way the scheduler can allocate the resources more >> >> effectively. >> >> >>> >> >> >>> >> >> >> Yes, absolutely. >> >> >> >> >> >> >> >> >> Q) In a cluster there might be nodes of different capacities, how >> will >> >> the >> >> >>> scheduler know that a particular node has 4 GB and another has 16 GB >> >> RAM >> >> >>> before allocating the resources to the ApplicationMaster? >> >> >>> >> >> >>> >> >> >> The NodeManager informs the RM about its capabilities on >> registration. >> >> The >> >> >> RM allocates appropriate resources to the AM(s). >> >> >> >> >> >> >> >> >> Q) Are the unnecessary containers (List<Container> release) in the >> >> request >> >> >>> released by the ApplicationMaster the ones rejected by the >> >> >>> ApplicationMaster >> >> >>> or those on which the map/reduce tasks have been completed? >> >> >>> >> >> >>> >> >> >> Only unused ones. >> >> >> >> >> >> >> >> >> Q) What does the following in the response contain - "List >> >> >>> <ContainerStatus> >> >> >>> containerStatuses"? >> >> >>> >> >> >>> >> >> >> Status for completed completed containers. >> >> >> >> >> >> >> >> >> Q) Once the ApplicationMaster gets the list of the new containers >> from >> >> the >> >> >>> Scheduler, what is the interaction between the ApplicationMaster and >> >> the >> >> >>> Node Manager? Will the ApplicationMaster ask the Node Manager on the >> >> >>> different nodes to launch/monitor the map/reduce tasks in those >> >> >>> containers? >> >> >>> >> >> >>> >> >> >> No, the AM directly monitors the containers via an >> application-specific >> >> >> protocol. >> >> >> >> >> >> For MR applications we use TaskUmbilicalProtocol. >> >> >> >> >> >> The NM just monitors the unix process and informs the RM on exit of >> the >> >> >> unix process. >> >> >> >> >> >> >> >> >> Q) Does the Scheduler ask the Node Manager to create the containers >> on >> >> the >> >> >>> different nodes? >> >> >>> >> >> >> >> >> >> No, the Scheduler allocates them to the respective AMs who then >> launch >> >> the >> >> >> container by talking to the NM. >> >> >> >> >> >> The NM can securely verify the authenticity of the 'container launch' >> >> >> request, including the resources allocated to the container. >> >> >> >> >> >> >> >> >> >> >> >>> The resource requests are also aggregated by racks and then by the >> >> >>>>> >> >> >>>> special any (*) for all containers. All resource requests are >> subject >> >> to >> >> >>> change via the delta protocol. >> >> >>> >> >> >>> Q) Does (*) mean that the ApplicationMaster is OK with a container >> in >> >> any >> >> >>> rack/host? This might be applicable for Reduce tasks. >> >> >>> >> >> >>> Yes. >> >> >> >> >> >> Hope this helps. >> >> >> >> >> >> Arun >> >> >> >> >> > >> >> >> >> >> >> >> >> -- >> >> thanks >> >> mahadev >> >> @mahadevkonar >> >> >> > >> >> >> >> -- >> thanks >> mahadev >> @mahadevkonar >> >