zhihai xu created MAPREDUCE-5969:
------------------------------------
Summary: Private non-Archive Files' size add twice in Distributed
Cache directory size calculation.
Key: MAPREDUCE-5969
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv1
Reporter: zhihai xu
Assignee: zhihai xu
Private non-Archive Files' size add twice in Distributed Cache directory size
calculation. Private non-Archive Files list is passed in by "-files" command
line option. The Distributed Cache directory size is used to check whether the
total cache files size exceed the cache size limitation, the default cache
size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in
TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files
hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865
bytes. The total should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download
to local node, so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is
at
getLocalCache:
if (!isArchive) {
//for private archives, the lengths come over RPC from the
//JobLocalizer since the JobLocalizer is the one who expands
//archives and gets the total length
lcacheStatus.size = fileStatus.getLen();
LOG.info("getLocalCache:" + localizedPath + " size = "
+ lcacheStatus.size);
// Increase the size and sub directory count of the cache
// from baseDirSize and baseDirNumberSubDir.
baseDirManager.addCacheInfoUpdate(lcacheStatus);
}
The second time we add file size is at
setSize:
synchronized (status) {
status.size = size;
baseDirManager.addCacheInfoUpdate(status);
}
The fix is not to add the file size for for Private non-Archive File after
download(downloadCacheObject).
--
This message was sent by Atlassian JIRA
(v6.2#6252)