FileSplit.hosts should have the host names "intern"ed
-----------------------------------------------------

                 Key: MAPREDUCE-1374
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 0.20.1, 0.21.0, 0.22.0
            Reporter: Zheng Shao


We can have many FileInput objects in the memory, depending on the number of 
mappers.
It will save tons of memory on JobTracker and JobClient if we intern those 
Strings for host names.

{code}
FileInputFormat.java:

      for (NodeInfo host: hostList) {
        // Strip out the port number from the host name
-        retVal[index++] = host.node.getName().split(":")[0];
+        retVal[index++] = host.node.getName().split(":")[0].intern();
        if (index == replicationFactor) {
          done = true;
          break;
        }
      }
{code}

More on String.intern(): 
http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to