[
https://issues.apache.org/jira/browse/HADOOP-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571427#action_12571427
]
Karam Singh commented on HADOOP-2879:
-------------------------------------
Looking at code -:
StreamJob.java (line 845)-:
[
boolean b = DistributedCache.checkURIs(fileURIs, archiveURIs);
if (!b)
fail(LINK_URI);
}
]
It is observed the StreamJob.java is calling checkURIs of Distributed.
Looking at ChekURIs code from org.apahe.hadoop.DistributedCache.java (line 716
onwrds) -:
[
if (uriFiles != null){
for (int i = 0; i < uriFiles.length; i++){
String frag1 = uriFiles[i].getFragment();
if (frag1 == null)
return false;
for (int j=i+1; j < uriFiles.length; j++){
String frag2 = uriFiles[j].getFragment();
if (frag2 == null)
return false;
if (frag1.equalsIgnoreCase(frag2))
return false;
}
if (uriArchives != null){
for (int j = 0; j < uriArchives.length; j++){
String frag2 = uriArchives[j].getFragment();
if (frag2 == null){
return false;
}
if (frag1.equalsIgnoreCase(frag2))
return false;
for (int k=j+1; k < uriArchives.length; k++){
String frag3 = uriArchives[k].getFragment();
if (frag3 == null)
return false;
if (frag2.equalsIgnoreCase(frag3))
return false;
}
}
}
}
}
return true;
]
It seems that if uriFiles is null it does no checks for uriArchives. So if
-cacheFile option is not present then it will validate cacheArchive uris
> If #link is missing from uri format of -cacheArchive then streaming does not
> throw error.
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-2879
> URL: https://issues.apache.org/jira/browse/HADOOP-2879
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: Karam Singh
> Priority: Minor
>
> Ran hadoop streaming command as -:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out
> -mapper "xargs cat" -reducer "bin/cat" -cahceArchive hdfs://h:p/pathofJarFile
> Streaming submits job to jobtracker and map fails.
> For similar with -cacheFile -:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out
> -mapper "xargs cat" -reducer "bin/cat" -cahceFile hdfs://h:p/pathofFile
> followinng error is repoerted back -:
> [
> You need to specify the uris as hdfs://host:port/#linkname,Please specify a
> different link name for all of your caching URIs
> ]
> Streaming should check about present #link after uri of cacheArchive and
> should throw proper error .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.