[
https://issues.apache.org/jira/browse/MAPREDUCE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Azuryy(Chijiong) updated MAPREDUCE-3323:
----------------------------------------
Component/s: tasktracker
Affects Version/s: 0.20.203.0
Release Note:
Tested as follow:
1: Add cache file for map;
2: get cache files in the configure of the map and reduce, then pring some
message if map/reduce can get cache file or not
conclusion:
It does work!
> Distributed Cache for Map or Reduce or Both
> -------------------------------------------
>
> Key: MAPREDUCE-3323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3323
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tasktracker
> Affects Versions: 0.20.203.0
> Reporter: Azuryy(Chijiong)
> Attachments: dc.patch
>
>
> We put some file into Distributed Cache, but sometimes, only Map or Reduce
> use thses cached files, not useful for both. but TaskTracker always download
> cached files from HDFS, if there are some little bit big files in cache, it's
> time expensive.
> so, this patch add some new API in the DistributedCache.java as follow:
> addArchiveToClassPathForMap
> addArchiveToClassPathForReduce
> addFileToClassPathForMap
> addFileToClassPathForReduce
> addCacheFileForMap
> addCacheFileForReduce
> addCacheArchiveForMap
> addCacheArchiveForReduce
> New API doesn't affect original interface. but they are specified for only
> map or reduce, not both of them.
> But if you do need cache file during both map and reduce, then use original
> interface.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira