[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145573#comment-13145573
 ] 

Robert Joseph Evans commented on MAPREDUCE-3323:
------------------------------------------------

I really like the concept.  I have had customers ask for this type of 
functionality as well.  I would suggest that you target this to 0.20.206 as 
that is the next major release of the 0.20 line.  It is not likely to get into 
0.20.203 because it is new feature work, not a bug fix.  Please concatenate all 
of the patches for different files into a single patch.  This can be done by 
running
{code}
svn diff > MAPREDUCE-3323.patch
{code}
from the top level directory if you are using svn or if you are using git you 
can run
{code}
git diff -p --no-prefix origin/branch-0.20-security > MAPREDUCE-3323.patch
{code}

Because the patch is not for trunk you also need to run test patch.  You can 
look [here|http://wiki.apache.org/hadoop/HowToContribute?action=recall&rev=55] 
for instruction on how to contribute to the 0.20.2XX line.  It is an older 
version of the wiki I list below.

I would also suggest that you look into adding in this same functionality to 
trunk and possibly branch-0.23.  Both of them are using MRV2, so that mapreduce 
code is very different from what you are working on.  You can look 
[here|http://wiki.apache.org/hadoop/HowToContribute] for information on how to 
contribute.

I have not had a chance to look through all of your patch files yet, but what I 
have seen so far looks good.
                
> Add new interface for Distributed Cache, which special  for Map or Reduce,but 
> not Both.
> ---------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache, tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Azuryy(Chijiong)
>             Fix For: 0.20.203.0
>
>         Attachments: DistributedCache.patch, GenericOptionsParser.patch, 
> JobClient.patch, TaskDistributedCacheManager.patch, TaskTracker.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce 
> use thses cached files, not useful for both. but TaskTracker always download 
> cached files from HDFS, if there are some little bit big files in cache, it's 
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. User can use these features like 
> the following two methods:
> 1) 
> hadoop job **** -files file1 -mapfiles file2 -reducefiles file3 -archives 
> arc1 -maparchives arc2 -reduce archives arc3
> 2)
> DistributedCache.addCacheFile(conf, file1);
> DistributedCache.addCacheFileForMap(conf, file2);
> DistributedCache.addCacheFileForReduce(conf, file3);
> DistributedCache.addCacheArchives(conf, arc1);
> DistributedCache.addCacheArchivesForMap(conf, arc2);
> DistributedCache.addCacheFArchivesForReduce(conf, arc3);
> These two methods have the same result, That's mean: 
> You put six files to the distributed cache: file1 ~ file3, arc1 ~ arc3, 
> but file1 and arc1 are cached for both map and reduce;
> file2 and arc2 are only cached for map;
> file3 and arc3 are only cached for reduce;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to