[ 
https://issues.apache.org/jira/browse/HADOOP-5059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664268#action_12664268
 ] 

Doug Cutting commented on HADOOP-5059:
--------------------------------------

Based on the descriptions here:

http://lists.uclibc.org/pipermail/busybox/2005-December/017513.html

and here:

http://www.unixguide.net/unix/programming/1.1.2.shtml

It seems like Java is correct to use fork()+exec(), not vfork()+exec().  But 
that with really big processes, if your swap space isn't huge and you don't 
have overcommit_memory=1, you'll inevitably see these problems when you fork.  
The standard workaround seems to be to keep a subprocess around and re-use it, 
which has its own set of problems.

If you have either lots of swap space configured or have 
overcommit_memory=1overcommit_memory=1 then I don't think there's any 
performance penalty to using fork().  The new process has a huge address space 
that's nearly entirely shared with its parent for a short time, then it quickly 
shrinks down once the command is exec'd to something tiny, so it's harmless.  
So these solutions (increased swap or overcommit_memory=1) seem reasonable to 
me.

At root this seems like a bug in Linux, that you cannot spawn a new subprocess 
without temporarily using as much address space as the parent process, but it 
does not seem like a bug that's likely to be fixed soon.

Does this analysis sound right to others?


> 'whoami', 'topologyscript' calls failing with java.io.IOException: error=12, 
> Cannot allocate memory
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5059
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>         Environment: On nodes with 
> physical memory 32G
> Swap 16G 
> Primary/Secondary Namenode using 25G of heap or more
>            Reporter: Koji Noguchi
>         Attachments: TestSysCall.java
>
>
> We've seen primary/secondary namenodes fail when calling whoami or 
> topologyscripts.
> (Discussed as part of HADOOP-4998)
> Sample stack traces.
> Primary Namenode
> {noformat}
> 2009-01-12 03:57:27,381 WARN org.apache.hadoop.net.ScriptBasedMapping: 
> java.io.IOException: Cannot run program
> "/path/topologyProgram" (in directory "/path"):
> java.io.IOException: error=12, Cannot allocate memory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
>         at org.apache.hadoop.util.Shell.run(Shell.java:134)
>         at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
>         at 
> org.apache.hadoop.net.ScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:122)
>         at 
> org.apache.hadoop.net.ScriptBasedMapping.resolve(ScriptBasedMapping.java:73)
>         at 
> org.apache.hadoop.dfs.FSNamesystem$ResolutionMonitor.run(FSNamesystem.java:1869)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: java.io.IOException: error=12, Cannot 
> allocate memory
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
>         ... 7 more
> 2009-01-12 03:57:27,381 ERROR org.apache.hadoop.fs.FSNamesystem: The resolve 
> call returned null! Using /default-rack
> for some hosts
> 2009-01-12 03:57:27,381 INFO org.apache.hadoop.net.NetworkTopology: Adding a 
> new node: /default-rack/55.5.55.55:50010
> {noformat}
> Secondary Namenode
> {noformat}
> 2008-10-09 02:00:58,288 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
> java.io.IOException:
> javax.security.auth.login.LoginException: Login failed: Cannot run program 
> "whoami": java.io.IOException:
> error=12, Cannot allocate memory
>         at 
> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
>         at 
> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
>         at 
> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:257)
>         at 
> org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:370)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:359)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.doMerge(SecondaryNameNode.java:340)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:312)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>         at java.lang.Thread.run(Thread.java:619)
>         at 
> org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:372)
>         at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:359)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.doMerge(SecondaryNameNode.java:340)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:312)
>         at 
> org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:223)
>         at java.lang.Thread.run(Thread.java:619)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to