[jira] [Commented] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

Robert Joseph Evans (Commented) (JIRA) Mon, 07 Nov 2011 10:21:18 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145669#comment-13145669
 ]


Robert Joseph Evans commented on MAPREDUCE-3323:
------------------------------------------------

I have read through all of your patches and I have a few comments.

# I don't really like the name of current.task.type.internal.  It would be 
better to prefix it with mapreduce.  
# I think it is slightly faster to change {code}fileURI.toArray(new 
URI[0]){code} to {code}fileURI.toArray(new URI[fileURI.size()]){code}, but this 
is just a nit.
# There are no tests in the patches.  I know you have done some manual testing, 
but adding/updating the unit tests is important for this to be accepted in.
# Have you tested add(Archive|File)ToClassPathFor(Map|Reduce)?  They set 
"mapred.job.classpath.(archives|files)" so if you use these methods some of the 
entries in "mapred.job.classpath.(archives|files)" will not be valid
# Why are you setting CACHE_(FILE|ARCHIVE)_FOR_(MAP|REDUCE)?  It seems like you 
could just go off of the existence of CACHE_(ARCHIVES|FILES)_(MAP|REDUCE).
# could you please add in the new user facing configuration keys to 
mapred-default.xml so that they are documented.


                
> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache, tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Azuryy(Chijiong)
>             Fix For: 0.20.203.0
>
>         Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job **** -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3323) Add new interface for Distributed Cache, which special for Map or Reduce,but not Both.

Reply via email to