[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1374:
----------------------------------

    Description: 
We can have many FileInput objects in the memory, depending on the number of 
mappers.

It will save tons of memory on JobTracker and JobClient if we intern those 
Strings for host names.

{code}
FileInputFormat.java:

      for (NodeInfo host: hostList) {
        // Strip out the port number from the host name
-        retVal[index++] = host.node.getName().split(":")[0];
+        retVal[index++] = host.node.getName().split(":")[0].intern();
        if (index == replicationFactor) {
          done = true;
          break;
        }
      }
{code}

More on String.intern(): 
http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html


It will also save a lot of memory by changing the class of {{file}} from 
{{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
contains ~10 String fields. This will also be a huge saving.

{code}
  private Path file;
{code}



  was:
We can have many FileInput objects in the memory, depending on the number of 
mappers.
It will save tons of memory on JobTracker and JobClient if we intern those 
Strings for host names.

{code}
FileInputFormat.java:

      for (NodeInfo host: hostList) {
        // Strip out the port number from the host name
-        retVal[index++] = host.node.getName().split(":")[0];
+        retVal[index++] = host.node.getName().split(":")[0].intern();
        if (index == replicationFactor) {
          done = true;
          break;
        }
      }
{code}

More on String.intern(): 
http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html


        Summary: Reduce memory footprint of FileSplit  (was: FileSplit.hosts 
should have the host names "intern"ed)

> Reduce memory footprint of FileSplit
> ------------------------------------
>
>                 Key: MAPREDUCE-1374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Zheng Shao
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>       for (NodeInfo host: hostList) {
>         // Strip out the port number from the host name
> -        retVal[index++] = host.node.getName().split(":")[0];
> +        retVal[index++] = host.node.getName().split(":")[0].intern();
>         if (index == replicationFactor) {
>           done = true;
>           break;
>         }
>       }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to