suiyuzeng opened a new issue #2806:
URL: https://github.com/apache/bookkeeper/issues/2806


   **BUG REPORT**
   
   ***Describe the bug***
   There is about 1M ledgers in per entry log. After running for a while, OOM 
will appear. And there is still enough memory.
   There are two OOM positions, as follows:
   Position 1:
   2021-09-21 02:22:08,323 [SyncThread-7-1] ERROR 
org.apache.bookkeeper.bookie.SyncThread - Exception in SyncThread
   java.lang.OutOfMemoryError: Java heap space
       at 
org.apache.bookkeeper.bookie.storage.ldb.WriteCache.forEach(WriteCache.java:222)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
   
   Position 2:
   2021-09-21 02:24:14,987 [SyncThread-7-1] ERROR 
org.apache.bookkeeper.bookie.SyncThread - Exception in SyncThread
   java.lang.OutOfMemoryError: Java heap space
       at 
org.apache.bookkeeper.util.collections.ConcurrentLongLongHashMap$Section.rehash(ConcurrentLongLongHashMap.java:673)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
       at 
org.apache.bookkeeper.util.collections.ConcurrentLongLongHashMap$Section.addAndGet(ConcurrentLongLongHashMap.java:456)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
       at 
org.apache.bookkeeper.util.collections.ConcurrentLongLongHashMap.addAndGet(ConcurrentLongLongHashMap.java:186)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
       at 
org.apache.bookkeeper.bookie.EntryLogMetadata.addLedgerSize(EntryLogMetadata.java:47)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1]
   
   main GC log:
   2021-09-21T02:22:08.437+0800: 105453.194: [GC pause (G1 Humongous 
Allocation) 
   2021-09-21T02:22:08.449+0800: 105453.206: [Full GC (Allocation Failure)  
10G->10G(20G), 1.9652874 secs]
   [Eden: 0.0B(992.0M)->0.0B(1024.0M) Survivors: 32.0M->0.0B Heap: 
10.3G(20.0G)->10.3G(20.0G)], [Metaspace: 35551K->35539K(1081344K)]
   [Times: user=4.94 sys=0.00, real=1.96 secs]
   2021-09-21T02:22:10.415+0800: 105455.172: [Full GC (Allocation Failure)  
10G->10G(20G), 1.6151095 secs]
   [Eden: 0.0B(1024.0M)->0.0B(1024.0M) Survivors: 0.0B->0.0B Heap: 
10.3G(20.0G)->10.3G(20.0G)], [Metaspace: 35539K->35539K(1081344K)]
   [Times: user=4.32 sys=0.00, real=1.62 secs]
   
   The common feature is that they are allocting a humongous contiguous memory.
   Position 1:
   In WriteCache.forEach, about 1M entry per minute, the sortedEntries size 
shoud be 1M*4*2*8=64M byte.
   
   Position 2:
   As 1M ledgers per entry log, the table size of ConcurrentLongLongHashMap 
should be 1M*2*2*8=32M byte. Some times, ledger will be more than 1 M, so the 
memory shoud be larger than 32M.
   
   As use G1 and the G1HeapRegionSize is 32m (the max value), there maybe no 
contiguous regions to allocate the humongous contiguous memory. Pre-allocated a 
large memory for the sortedEntries and add concurrencyLevel for the 
ConcurrentLongLongHashMap of EntryLogMetadata. The issue do not appear again. 
How about add 2 config for these? 
   
   As the meta data of all entry log is in the memory,  the memory it occupied 
is very large. As 32MB EntryLogMetadata per entry log, the memroy shoul be 
serveral GB if there are hundreds of entry logs. I delete the ledgers by time. 
It will be delete after expire. If the entry log is not expire, it will not be 
delete. So the meta data is not need to load to memory. How about add a feature 
like this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to