[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802847#action_12802847
 ] 

Amar Kamat commented on MAPREDUCE-1374:
---------------------------------------

Few comments 
1) Since String.intern() takes up space in the PermGen area, JobClient should 
not get OOM because of the low PermGen heap space of JobClient. What should we 
do about it? A current client with low PermGen space and trying to submit a job 
with large input splits will fail with this patch.
2) In the testcase, can you add a simple testcase to simply check FileSpit 
getters and also FileSplit serialization? The reason is that one of the 
serialized parameters got changed.


> Reduce memory footprint of FileSplit
> ------------------------------------
>
>                 Key: MAPREDUCE-1374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, 
> MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of 
> mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those 
> Strings for host names.
> {code}
> FileInputFormat.java:
>       for (NodeInfo host: hostList) {
>         // Strip out the port number from the host name
> -        retVal[index++] = host.node.getName().split(":")[0];
> +        retVal[index++] = host.node.getName().split(":")[0].intern();
>         if (index == replicationFactor) {
>           done = true;
>           break;
>         }
>       }
> {code}
> More on String.intern(): 
> http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from 
> {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally 
> contains ~10 String fields. This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to