Hey Praveen,

I'm in the same boat as you re: getting started with the MR2 code. I
have a couple of answers and a couple of followup questions for Arun
et al. to keep in mind as they're writing a design doc.

On Wed, Jun 15, 2011 at 5:27 AM, Praveen Sripati
<praveensrip...@gmail.com> wrote:
> Hi,
>
> - How to specify that an ApplicationMaster use a particular version of the
> MapReduce library dynamically?

I don't totally grok the question-- doesn't the client-side code that
configures the ApplicationMaster decide this?

>
> - How does the ApplicationManager pick a node to run the ApplicationMaster?
> What resource considerations are taken if any while picking a particular
> node to run the ApplicationMaster?

There is an ApplicationsManager (note the extra 's') that is part of
the functionality of the RM. See reference:
http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/

>
> - Who observes the ResourceManager/ApplicationMaster/NodeManager for
> failures to be restarted later? From the blog entry it seems that the state
> of the ResourceManager is stored in the ZooKeeper and the state of the
> ApplicationManager is stored in the HDFS.

So this is the classic problem of any such system-- who watches the
watchmen?  It seems like the client would be notified when an
ApplicationManager failed by the ApplicationSManager (see above blog
post again, it's actually a good blog post, it would be great to have
a few more of them), the ResourceManager would know when a NodeManager
failed, and it falls to an admin and/or an external monitoring system
to detect ResourceManager failure and handle the restart.

>
> - Looks like the containers are based on Linux cgroups. So, is the MRv2
> limited only to the Linux boxes?

Yeah, I bumped into this when I was doing a naive build + install on
my Mac. Not that I see folks running alot of hadoop clusters on Macs,
but it would be cool if the basic build/install just worked on every
platform, even if it's just as simple as detecting the platform and
skipping the build of the native container-executor stuff. (Note: I
actually got the container-executor stuff to build by using the
standard Mac tricks, but I'm not sure if it's worth checking in.)

>
> Hope the design document from Arun will make me ask less queries in this
> forum :)
>
> Thanks,
> Praveen
>
> On Wed, Jun 15, 2011 at 9:17 AM, Mahadev Konar <maha...@apache.org> wrote:
>
>> Praveen,
>>  In that case, if a just launched container is released, the NM will
>> be notified via the RM that the container is not longer valid and the
>> NM will go ahead and kill the container.
>>
>>
>> On Tue, Jun 14, 2011 at 8:38 PM, Praveen Sripati
>> <praveensrip...@gmail.com> wrote:
>> > Mahadev,
>> >
>> > MapReduce ApplicationMaster might behave well, but what about custom
>> > ApplicationMasters for other models.
>> >
>> >> Q) What happens if an ApplicationMaster asks a NM to launch a container
>> > and
>> >> then releases the container in the allocate call later?
>> >
>> >> A) The Application Master only releases the container once the container
>> > is done.
>> >
>> > Thanks,
>> > Praveen
>> >
>> > On Wed, Jun 15, 2011 at 8:59 AM, Mahadev Konar <maha...@apache.org>
>> wrote:
>> >
>> >> Praveen,
>> >>  Answers in line:
>> >>
>> >> >
>> >> > Q) What happens if an ApplicationMaster asks a NM to launch a
>> container
>> >> and
>> >> > then releases the container in the allocate call later?
>> >>
>> >> The Application Master only releases the container once the container is
>> >> done.
>> >>
>> >> >
>> >> > Q) So, the NM watches the UNIX Process/Containers and sends the status
>> to
>> >> > the ApplicationManager. Later the ApplicationManager sends the status
>> of
>> >> the
>> >> > containers in response to the allocate call to the ApplicationMaster.
>> Why
>> >> > should the ApplicationMaster be aware of the container status, since
>> it's
>> >> > already tracking the map/reduce tasks in the containers?
>> >>
>> >> Its just a way to notify the application master as soon as possible
>> >> when the containers fail.
>> >> This helps in speeding up the notification of failed containers else
>> >> AM has to wait for discovering
>> >> failures via timeouts.
>> >>
>> >> >
>> >> > Q) Does the ApplicationMaster notify the NodeManager to exit the UNIX
>> >> > Process when the map/reduce tasks in that particular container are
>> >> > completed? Are the containers re-used?
>> >>
>> >> Yes it notifes the NM.
>> >>
>> >> Containers are not re used as of now. In future we do see the
>> >> containers being re used but we'll need leases to do that.
>> >>
>> >> >
>> >> > Q) The ApplicationManager asks the NodeManager to create a container
>> and
>> >> > also launch the map/reduce task in it. From then on the
>> >> ApplicationManager
>> >> > and Map/Reduce tasks interact directly without the NodeManager. Am I
>> >> > correct?
>> >> >
>> >> I think you mean ApplicationMaster. Yes, the applicationmaster and
>> >> map/reduce tasks talk directly
>> >> without NM being involved.
>> >>
>> >> > Praveen
>> >> >
>> >> > On Wed, Jun 15, 2011 at 12:59 AM, Arun C Murthy <a...@yahoo-inc.com>
>> >> wrote:
>> >> >
>> >> >>
>> >> >> On Jun 14, 2011, at 6:31 PM, Praveen Sripati wrote:
>> >> >>
>> >> >>  Hi,
>> >> >>>
>> >> >>> I have gone through MapReduce NextGen Blog entries and JIRA and have
>> >> the
>> >> >>> following queries
>> >> >>>
>> >> >>>  There is a single API between the Scheduler and the
>> ApplicationMaster:
>> >> >>>>>
>> >> >>>>
>> >> >>>  (List <Container> newContainers, List <ContainerStatus>
>> >> >>>>>
>> >> >>>> containerStatuses) allocate (List <ResourceRequest> ask,
>> >> List<Container>
>> >> >>> release)
>> >> >>>
>> >> >>>  The AM ask for specific resources via a list of ResourceRequests
>> (ask)
>> >> >>>>>
>> >> >>>> and releases unnecessary Containers which were allocated by the
>> >> >>> Scheduler.
>> >> >>>
>> >> >>>  The response contains a list of newly allocated Containers and the
>> >> >>>>>
>> >> >>>> statuses of application-specific Containers that completed since
>> the
>> >> >>> previous interaction between the AM and the RM.
>> >> >>>
>> >> >>> Q) If split-0 is is available in host1, host2 and host3, can
>> >> >>> ApplicationMaster request a scheduler for a container on host1 or
>> host2
>> >> or
>> >> >>> host3? This way the scheduler can allocate the resources more
>> >> effectively.
>> >> >>>
>> >> >>>
>> >> >> Yes, absolutely.
>> >> >>
>> >> >>
>> >> >>  Q) In a cluster there might be nodes of different capacities, how
>> will
>> >> the
>> >> >>> scheduler know that a particular node has 4 GB and another has 16 GB
>> >> RAM
>> >> >>> before allocating the resources to the ApplicationMaster?
>> >> >>>
>> >> >>>
>> >> >> The NodeManager informs the RM about its capabilities on
>> registration.
>> >> The
>> >> >> RM allocates appropriate resources to the AM(s).
>> >> >>
>> >> >>
>> >> >>  Q) Are the unnecessary containers (List<Container> release) in the
>> >> request
>> >> >>> released by the ApplicationMaster the ones rejected by the
>> >> >>> ApplicationMaster
>> >> >>> or those on which the map/reduce tasks have been completed?
>> >> >>>
>> >> >>>
>> >> >> Only unused ones.
>> >> >>
>> >> >>
>> >> >>  Q) What does the following in the response contain - "List
>> >> >>> <ContainerStatus>
>> >> >>> containerStatuses"?
>> >> >>>
>> >> >>>
>> >> >> Status for completed completed containers.
>> >> >>
>> >> >>
>> >> >>  Q) Once the ApplicationMaster gets the list of the new containers
>> from
>> >> the
>> >> >>> Scheduler, what is the interaction between the ApplicationMaster and
>> >> the
>> >> >>> Node Manager? Will the ApplicationMaster ask the Node Manager on the
>> >> >>> different nodes to launch/monitor the map/reduce tasks in those
>> >> >>> containers?
>> >> >>>
>> >> >>>
>> >> >> No, the AM directly monitors the containers via an
>> application-specific
>> >> >> protocol.
>> >> >>
>> >> >> For MR applications we use TaskUmbilicalProtocol.
>> >> >>
>> >> >> The NM just monitors the unix process and informs the RM on exit of
>> the
>> >> >> unix process.
>> >> >>
>> >> >>
>> >> >>  Q) Does the Scheduler ask the Node Manager to create the containers
>> on
>> >> the
>> >> >>> different nodes?
>> >> >>>
>> >> >>
>> >> >> No, the Scheduler allocates them to the respective AMs who then
>> launch
>> >> the
>> >> >> container by talking to the NM.
>> >> >>
>> >> >> The NM can securely verify the authenticity of the 'container launch'
>> >> >> request, including the resources allocated to the container.
>> >> >>
>> >> >>
>> >> >>
>> >> >>>  The resource requests are also aggregated by racks and then by the
>> >> >>>>>
>> >> >>>> special any (*) for all containers. All resource requests are
>> subject
>> >> to
>> >> >>> change via the delta protocol.
>> >> >>>
>> >> >>> Q) Does (*) mean that the ApplicationMaster is OK with a container
>> in
>> >> any
>> >> >>> rack/host? This might be applicable for Reduce tasks.
>> >> >>>
>> >> >>>  Yes.
>> >> >>
>> >> >> Hope this helps.
>> >> >>
>> >> >> Arun
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> thanks
>> >> mahadev
>> >> @mahadevkonar
>> >>
>> >
>>
>>
>>
>> --
>> thanks
>> mahadev
>> @mahadevkonar
>>
>

Reply via email to