[
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932008#comment-16932008
]
Zhao Yi Ming commented on HDFS-14308:
-------------------------------------
We hit the direct buffer memory OOM when use Hbase bulk load for HDFS EC
folder. Read some code, there is a potential risk in the ElasticByteBufferPool,
as following code show, the tree check the key, and in HDFS client
DFSStripedInputStream allocateDirect buffer pass the parameter is cellSize *
dataBlkNum, here the question is if there are many different cellSize, it can
introduce the direct buffer memory OOM.
{code:java}
// code placeholder
public synchronized void putBuffer(ByteBuffer buffer) {
buffer.clear();
TreeMap<Key, ByteBuffer> tree = getBufferTree(buffer.isDirect());
while (true) {
Key key = new Key(buffer.capacity(), System.nanoTime());
if (!tree.containsKey(key)) {
tree.put(key, buffer);
return;
}
// Buffers are indexed by (capacity, time).
// If our key is not unique on the first try, we try again, since the
// time will be different. Since we use nanoseconds, it's pretty
// unlikely that we'll loop even once, unless the system clock has a
// poor granularity.
}
}
{code}
Wrote a simple test as following it can recreate the problem.
Please set the JVM arguments first, then run the test, it will hit the OOM.
-Xmx64m
-Xms64m
-Xmn32m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:MaxDirectMemorySize=10M
{code:java}
// code placeholder
public class TestEBBP {
private static final ByteBufferPool BUFFER_POOL = new
ElasticByteBufferPool();
@Test
public void testOOM() {
for (int i = 0; i < 100; i++) {
ByteBuffer buffer = BUFFER_POOL.getBuffer(true, 1024 *
6 * i);
BUFFER_POOL.putBuffer(buffer);
}
System.out.println(((ElasticByteBufferPool)BUFFER_POOL).size(true));
}
}
{code}
I am NOT pretty sure whether this is root cause for this issue, but wrote it
out for FYI.
> DFSStripedInputStream curStripeBuf is not freed by unbuffer()
> -------------------------------------------------------------
>
> Key: HDFS-14308
> URL: https://issues.apache.org/jira/browse/HDFS-14308
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Joe McDonnell
> Priority: Major
> Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file
> handles by default. Recent tests on erasure coded files show that the open
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream.
> Attached is output from Eclipse MAT showing that the direct buffers come from
> DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a
> file handle is being cached and potentially unused for significant chunks of
> time, yet this shows that the memory remains in use.
> To support caching file handles on erasure coded files, DFSStripedInputStream
> should avoid holding buffers after the unbuffer() call. See HDFS-7694.
> "unbuffer()" is intended to move an input stream to a lower memory state to
> support these caching use cases. In particular, the curStripeBuf seems to be
> allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is
> not freed until close().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]