[
https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13765889#comment-13765889
]
Hitesh Shah commented on MAPREDUCE-4421:
----------------------------------------
[~jlowe] Had a few questions/comments related to the implementation/patch:
- Why does classpath need to include all of common, hdfs and yarn jar
locations? Assuming that MR is running on a YARN-based cluster, shouldn't the
location of the core dependencies come from the cluster deployment i.e. via the
env that the NM sets for a container. I believe the only jars that MR should
have in its uploaded tarball should be the client jars. I understand that there
is no clear boundary for client-side only jars for common and hdfs today ( for
For YARN, I believe it should be simple to split out the client-side
requirements ) but it is something we should aim for or assume that the jars
deployed on the cluster are compatible.
- I guess the underlying question is why use the full hadoop tarball and not
just the mapreduce-only tarball? If MR is trully a user-land library, it should
be treated as such and have a separate deployment approach.
- I would vote to make the tar-ball in HDFS be the only way to run MR on YARN.
Obviously, this cannot be done for 2.x but we should move to this model on
trunk and not support the current approach at all there. Comments?
- The other point is related to configs. Configuration still loads mapred-site
and mapred-default files and new Configuration objects are created on the
cluster. Are these files still expected on the cluster? job.xml does override
these but cluster configs could still have final params. If this is meant to be
addressed in a follow-up jira to ensure all MR configs come from the client,
you can ignore this point for now.
- How do you see framework name extracted from the path to be used? Is it just
a safety check to ensure that it is found in the classpath? Will it have any
relation to a version? A minor nit - framework name seems confusing in relation
to the framework name in use from earlier i.e yarn vs local framework.
- Description in the default-xml for mapreduce.application.framework.path does
not mention the need for the URI fragment and how the fragment is used as a
sanity check to the classpath.
- Regarding versions, it seems like users will need to do 2 things. Change the
location of the tarball on HDFS and modify the classpath. Users will need to
know the exact structure of the classpath. In such a scenario, do defaults even
make sense? On the other hand, if we define a common standard i.e. a base path
for all MR tarballs, with each tarball in a defined structure ( possibly with
version info added on later on for the code to infer the structure of the
tarball ), all the user would need to do is specify the base path ( which could
have a default value ) and a version which again has a default value. The
latter approach would require the code to construct the necessary classpath if
the upload path is in use. Do you have any comments on which of the 2
approaches makes more sense? The former is way more flexible but a bit more
complex. The latter brittle/inflexible with respect to changing tarball
structures but likely more easier to enforce a standard on.
> Remove dependency on deployed MR jars
> -------------------------------------
>
> Key: MAPREDUCE-4421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 2.0.0-alpha
> Reporter: Arun C Murthy
> Assignee: Jason Lowe
> Attachments: MAPREDUCE-4421.patch, MAPREDUCE-4421.patch
>
>
> Currently MR AM depends on MR jars being deployed on all nodes via implicit
> dependency on YARN_APPLICATION_CLASSPATH.
> We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and,
> probably, just rely on adding a shaded MR jar along with job.jar to the
> dist-cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira