[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800969#action_12800969 ]
Zheng Shao commented on MAPREDUCE-1374: --------------------------------------- Verified that the test error is not related: {code} java.lang.ClassNotFoundException: org.apache.hadoop.mapred.TestTTMemoryReporting at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) {code} > Reduce memory footprint of FileSplit > ------------------------------------ > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > - retVal[index++] = host.node.getName().split(":")[0]; > + retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.