FileSplit.hosts should have the host names "intern"ed -----------------------------------------------------
Key: MAPREDUCE-1374 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Zheng Shao We can have many FileInput objects in the memory, depending on the number of mappers. It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names. {code} FileInputFormat.java: for (NodeInfo host: hostList) { // Strip out the port number from the host name - retVal[index++] = host.node.getName().split(":")[0]; + retVal[index++] = host.node.getName().split(":")[0].intern(); if (index == replicationFactor) { done = true; break; } } {code} More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.