[
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008517#comment-13008517
]
Todd Lipcon commented on MAPREDUCE-279:
---------------------------------------
Hi Arun. I spent the train ride this morning looking over yarn/src/main/avro in
the branch. Here are a few comments, sorry for the somewhat
stream-of-consciousness format.
- Is the correct suffix still .genavro? Thought we'd changed the name to
.avroidl or something?
- Apache licenses needed on these files
- Does AvroIDL convert javadoc-style comments on records/protocols into JavaDoc
on generated code? If so we should do more of that.
- AMRMProtocol:
-- the "release" parameter to allocate is strange: (a) it seems the function is
misnamed if you can also release things as you call it, and (b) why isn't it an
array<ContainerId>?
-- if you want to cancel previous resource requests, do you submit a new one
with a negative numContainers?
- ApplicationSubmissionContext:
-- would be good to have some kind of scheduler-specific parameters here? eg
maybe a scheduler has something beyond just "priority" (eg. perhaps a deadline)
-- using just URL type directly for resources - seems not quite flexible
enough? eg one useful construct would be a URL + checksum
-- what's resources_todo going to be?
-- passing "user" - agreed, this should be more flexible than simple string.
-- Why not contain a ContainerLaunchContext to specify the container in which
to run the AM? Seems like lots of duplicated fields.
- ContainerManager:
-- not following YarnContainerTags - these are opaque enums, how do they get
interpolated in a string?
-- how does one access stderr/stdout contents? both while they're being written
and after a container has terminated? (maybe I just haven't gotten to that bit
yet somewhere else)
- yarn-types.avro:
-- For the typesafe ID classes, do we need to specify explicit comparison
orderings? I don't know Avro behavior here.
-- Did you consider making the ids all strings instead of ints? The pro would
be that there could be canonical formats, like "AM-<hex id>" for app masters vs
"C-<hex id>" for containers. AWS does a good job of this.
-- Resource: field names should include units, like "int memoryMB"
-- what are ContainerTokens? could use some extra doc at the protocol layer
here. (I assume this is for security?)
-- The "Container" type doesn't appear
-- the URL record is missing user/password used for http basic auth or s3n auth
-- there are some hard tabs in this file
-- ApplicationMaster:
--- httpPort seems like it would be better described as something like
"httpStatusURL"?
-- LocalResourceVisibility:
--- just to clarify, APPLICATION visibility means "only to this application
submitted by this user". ie if joe and bob both submit MapReduce 2.x.y jobs
with identical jars, it still won't share, even if sha1s match?
--- if bob submits the same application (ie MR 2.x.y) twice, do APPLICATION
visibility files get shared?
> Map-Reduce 2.0
> --------------
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobtracker, tasktracker
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh,
> MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job,
> user-defined component that manages the application execution.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira