[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799687#action_12799687 ]
Hadoop QA commented on MAPREDUCE-1374: -------------------------------------- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430108/MAPREDUCE-1374.2.patch against trunk revision 898486. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/269/console This message is automatically generated. > Reduce memory footprint of FileSplit > ------------------------------------ > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > - retVal[index++] = host.node.getName().split(":")[0]; > + retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.