[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614471#comment-13614471
 ] 

Robert Joseph Evans commented on MAPREDUCE-4820:
------------------------------------------------

Looking at the confs I see the following in launcher-job.conf.xml

{noformat}
<property><name>mapreduce.job.cache.files</name><value>hdfs://ip-10-113-15-16.ec2.internal:17020/user/root/oozie-oozi/0000003-130320172938946-oozie-oozi-W/mr-node--map-reduce/map-reduce-launcher.jar,hdfs://ip-10-113-15-16.ec2.internal:17020/user/root/examples/apps/map-reduce/lib/oozie-examples-3.3.1.jar</value><source>programatically</source><source>job.xml</source></property>
{noformat}

But there is no mapreduce.job.cache.files set for mr-job.conf.xml

Also there is no mapreduce.job.cache.archives set in either of these configs.

The missing cache.files seems more likely to be the cause of the issue.  The 
code in MRApps does not manipulate the conf, it just translates it into a Map 
that can be sent to the RM.  This seems to indicate that the issue is happening 
prior to getting the the MRApps code.  Somewhere when the conf is being 
generated inside the launcher job, or possibly further up in the MR client that 
is setting up the distributed cache items.

I'm just trying to help you not go chasing a white rabbit in MAPREDUCE-4549 and 
MAPREDUCE-4503.
                
> MRApps distributed-cache duplicate checks are incorrect
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-4820
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4820
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.2-alpha
>            Reporter: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 2.0.4-alpha
>
>         Attachments: launcher-job.conf.xml, launcher-job.logs.txt, 
> mr-job.conf.xml, mr-job.logs.txt
>
>
> This seems a combination of issues that are being exposed in 2.0.2-alpha by 
> MAPREDUCE-4549.
> MAPREDUCE-4549 introduces a check to to ensure there are not duplicate JARs 
> in the distributed-cache (using the JAR name as identity).
> In Hadoop 2 (different from Hadoop 1), all JARs in the distributed-cache are 
> symlink-ed to the current directory of the task.
> MRApps, when setting up the DistributedCache 
> (MRApps#setupDistributedCache->parseDistributedCacheArtifacts) assumes that 
> the local resources (this includes files in the CURRENT_DIR/, 
> CURRENT_DIR/classes/ and files in CURRENT_DIR/lib/) are part of the 
> distributed-cache already.
> For systems, like Oozie, which use a launcher job to submit the real job this 
> poses a problem because MRApps is run from the launcher job to submit the 
> real job. The configuration of the real job has the correct distributed-cache 
> entries (no duplicates), but because the current dir has the same files, the 
> submission fails.
> It seems that MRApps should not be checking dups in the distributed-cached 
> against JARs in the CURRENT_DIR/ or CURRENT_DIR/lib/. The dup check should be 
> done among distributed-cached entries only.
> It seems YARNRunner is symlink-ing all files in the distributed cached in the 
> current directory. In Hadoop 1 this was done only for files added to the 
> distributed-cache using a fragment (ie "#FOO") to trigger a symlink creation. 
> Marking as a blocker because without a fix for this, Oozie cannot submit jobs 
> to Hadoop 2 (i've debugged Oozie in a live cluster being used by BigTop 
> -thanks Roman- to test their release work, and I've verified that Oozie 3.3 
> does not create duplicated entries in the distributed-cache)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to