[ 
https://issues.apache.org/jira/browse/MAPREDUCE-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12873667#action_12873667
 ] 

Vinod K V commented on MAPREDUCE-572:
-------------------------------------

Looked at the patch. When we 'check if there is any conflict in fragment 
names', we can do better than the O (n^2) comparisons to verify if there is any 
duplicate, for e.g. while even iterating the files/archives to see if any 
fragment is null, we can put them in a map keyed by fragment name and fail 
immediately when we encounter duplicates on further iterations?

Granted this is not in any critical section, I am checking if we can 
incorporate a minor performance improvement now that the code in question is 
touched..

> If #link is missing from uri format of -cacheArchive then streaming does not 
> throw error.
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-572
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-572
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Karam Singh
>            Assignee: Amareshwari Sriramadasu
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: patch-572-1.txt, patch-572.txt
>
>
> Ran hadoop streaming command as -:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out 
> -mapper "xargs cat"  -reducer "bin/cat" -cahceArchive hdfs://h:p/pathofJarFile
> Streaming submits job to jobtracker and map fails.
> For similar with -cacheFile -:
> bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out 
> -mapper "xargs cat"  -reducer "bin/cat" -cahceFile hdfs://h:p/pathofFile
> followinng error is repoerted back -:
> [
> You need to specify the uris as hdfs://host:port/#linkname,Please specify a 
> different link name for all of your caching URIs
> ]
> Streaming should check about present #link after uri of cacheArchive and 
> should throw proper error .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to