[ 
https://issues.apache.org/jira/browse/HADOOP-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273881#comment-13273881
 ] 

Catalin Alexandru Zamfir commented on HADOOP-8396:
--------------------------------------------------

Reading this article: http://blog.egilh.com/2006/06/2811aspx.html given that 
the latest JVM allocates about 1M per thread, means that the 3.000/4.000 
threads are consistent to the node I'm running this on which has about 3GB 
available for "user-space". In theory I could reduce the stack size for native 
threads via -Xss, but that would only increase the number of threads, without 
actually resolving the problem. I think the problem is that Hadoop should let 
go of native threads that have already written their data to the HDFS. And I've 
checked, after writing a few million records, executing a "reader" class on 
that data, returns the data, meaning Hadoop got to write these to the HDFS, but 
watching "htop" the number of threads and memory for this code kept increasing 
and only when writing started. We're writing from one single thread (main).

Hadoop should let go of native threads or instruct the JVM to loose these 
threads once it knows it's written the corresponding data.
                
> DataStreamer, OutOfMemoryError, unable to create new native thread
> ------------------------------------------------------------------
>
>                 Key: HADOOP-8396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8396
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 1.0.2
>         Environment: Ubuntu 64bit, 4GB of RAM, Core Duo processors, commodity 
> hardware.
>            Reporter: Catalin Alexandru Zamfir
>            Priority: Blocker
>              Labels: DataStreamer, I/O, OutOfMemoryError, ResponseProcessor, 
> hadoop,, leak, memory, rpc,
>
> We're trying to write about 1 few billion records, via "Avro". When we got 
> this error, that's unrelated to our code:
> 10725984 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:58:43.290 # 
> Written: 521000000 records
> Exception in thread "DataStreamer for file 
> /Streams/Cubed/Stuff/objGame/aRandomGame/objType/aRandomType/2012/05/11/20/29/Shard.avro
>  block blk_3254486396346586049_75838" java.lang.OutOfMemoryError: unable to 
> create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:657)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:612)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1046)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
>         at $Proxy8.getProtocolVersion(Unknown Source)
>         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
>         at 
> org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:160)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3117)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2586)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2790)
> 10746169 [Main] INFO net.gameloft.RnD.Hadoop.App - ## At: 2:59:03.474 # 
> Written: 522000000 records
> Exception in thread "ResponseProcessor for block 
> blk_4201760269657070412_73948" java.lang.OutOfMemoryError
>         at sun.misc.Unsafe.allocateMemory(Native Method)
>         at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:117)
>         at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
>         at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:75)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:223)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
>         at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.DataInputStream.readFully(DataInputStream.java:195)
>         at java.io.DataInputStream.readLong(DataInputStream.java:416)
>         at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (malloc) failed to allocate 32 bytes for intptr_t 
> in 
> /build/buildd/openjdk-6-6b23~pre11/build/openjdk/hotspot/src/share/vm/runtime/deoptimization.cpp
> [thread 1587264368 also had an error]
> [thread 1111309168 also had an error]
> [thread 1820371824 also had an error]
> [thread 1343454064 also had an error]
> [thread 1345444720 also had an error]
> # An error report file with more information is saved as:
> # [thread 1345444720 also had an error]
> [thread -1091290256 also had an error]
> [thread 678165360 also had an error]
> [thread 678497136 also had an error]
> [thread 675511152 also had an error]
> [thread 1385937776 also had an error]
> [thread 911969136 also had an error]
> [thread -1086207120 also had an error]
> [thread -1088251024 also had an error]
> [thread -1088914576 also had an error]
> [thread -1086870672 also had an error]
> [thread 441797488 also had an error][thread 445778800 also had an error]
> [thread 440400752 also had an error]
> [thread 444119920 also had an error][thread 1151298416 also had an error]
> [thread 443124592 also had an error]
> [thread 1152625520 also had an error]
> [thread 913628016 also had an error]
> [thread -1095345296 also had an error][thread 1390799728 also had an error]
> [thread 443788144 also had an error]
> [thread 676506480 also had an error]
> [thread 1630595952 also had an error]
> pure virtual method called
> terminate called without an active exception
> pure virtual method called
> Aborted
> It seems to be a memory leak. We were opening 5 - 10 buffers to different 
> paths when writing and closing them. We've tested that those buffers do not 
> overrun. And they don't. But watching the application continue writing, we 
> saw that over a period of 5 to 6 hours, it kept constantly increasing in 
> memory, not by the average of 8MB buffer that we've set, but my small values. 
> I'm reading the code and it seems there's a memory leak somewhere, in the way 
> Hadoop does buffer allocation. While we specifically close the buffers if the 
> count of open buffers is above 5 (meaning 5 * 8MB per buffer) this bug still 
> happens.
> Can it be fixed? As you can see from the strack trace, it writes a "fan-out" 
> path of the type you see in the strack trace. We've let it execute till about 
> 500M records, when this error blew. It's a blocker as these writers need to 
> be production-grade ready, while they're not due to this native buffer 
> allocation that when executing large amounts of writes, seems to generate a 
> memory leak.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to