[ 
https://issues.apache.org/jira/browse/HADOOP-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571427#action_12571427
 ] 

Karam Singh commented on HADOOP-2879:
-------------------------------------

Looking at code -:
StreamJob.java (line 845)-:
[
boolean b = DistributedCache.checkURIs(fileURIs, archiveURIs);
      if (!b)
        fail(LINK_URI);
    }

]

It is observed the StreamJob.java is calling checkURIs of Distributed. 
Looking at ChekURIs code from org.apahe.hadoop.DistributedCache.java (line 716 
onwrds) -:
[
if (uriFiles != null){
      for (int i = 0; i < uriFiles.length; i++){
        String frag1 = uriFiles[i].getFragment();
        if (frag1 == null)
          return false;
        for (int j=i+1; j < uriFiles.length; j++){
          String frag2 = uriFiles[j].getFragment();
          if (frag2 == null)
            return false;
          if (frag1.equalsIgnoreCase(frag2))
            return false;
        }
        if (uriArchives != null){
          for (int j = 0; j < uriArchives.length; j++){
            String frag2 = uriArchives[j].getFragment();
            if (frag2 == null){
              return false;
            }
            if (frag1.equalsIgnoreCase(frag2))
              return false;
            for (int k=j+1; k < uriArchives.length; k++){
              String frag3 = uriArchives[k].getFragment();
              if (frag3 == null)
                return false;
              if (frag2.equalsIgnoreCase(frag3))
                return false;
            }
          }
        }
      }
    }
    return true;
]

It seems that  if uriFiles is null it does no checks for uriArchives. So if 
-cacheFile option is not present then it will validate cacheArchive uris


> If #link is missing from uri format of -cacheArchive then streaming does not 
> throw error.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2879
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2879
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Karam Singh
>            Priority: Minor
>
> Ran hadoop streaming command as -:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out 
> -mapper "xargs cat"  -reducer "bin/cat" -cahceArchive hdfs://h:p/pathofJarFile
> Streaming submits job to jobtracker and map fails.
> For similar with -cacheFile -:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out 
> -mapper "xargs cat"  -reducer "bin/cat" -cahceFile hdfs://h:p/pathofFile
> followinng error is repoerted back -:
> [
> You need to specify the uris as hdfs://host:port/#linkname,Please specify a 
> different link name for all of your caching URIs
> ]
> Streaming should check about present #link after uri of cacheArchive and 
> should throw proper error .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to