RaulGracia opened a new issue #2521:
URL: https://github.com/apache/bookkeeper/issues/2521


   **BUG REPORT**
   
   ***Describe the bug***
   
   We are executing a set of experiments with Bookkeeper 4.9.2 (via Pulsar). 
The experiments consists of a set of 5-minute write/read workloads at 250MBps 
(1KB events). On each 5-minute workload, we vary the number of 
producers/readers (1, 10, 50, 100) and topic partitions (1, 10, 100, 500, 1000, 
5000). We use 3 Bookies (jointly deployed with 3 Pulsar Brokers) and we write 
to Bookkeeper using `writeQuorum=ensembleSize=3` and `ackQuorumSize=2` (we 
provide later more details about deployment/experiments).
   
   After some time (30-45 mins), we observe one Bookie crashing due to 
`OutOfMemory`:
   ```
   Dec 10 12:38:46 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:46.808 [BookieWriteThreadPool-OrderedExecutor-4-0] INFO  
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage - Write cache is full, 
triggering flush
   Dec 10 12:38:48 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:48.829 [BookieWriteThreadPool-OrderedExecutor-5-0] INFO  
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage - Write cache is full, 
triggering flush
   Dec 10 12:38:50 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:50.172 [db-storage-12-1] INFO  
org.apache.bookkeeper.bookie.EntryLogManagerBase - Creating a new entry log 
file : createNewLog = false, reachEntryLogLimit = true
   Dec 10 12:38:50 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:50.178 [pool-5-thread-1] INFO  
org.apache.bookkeeper.bookie.EntryLoggerAllocator - Created new entry log file 
/mnt/storage/current/1b8.log for logId 440.
   Dec 10 12:38:50 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:50.175 [db-storage-12-1] INFO  
org.apache.bookkeeper.bookie.EntryLogManagerBase - Flushing entry logger 438 
back to filesystem, pending for syncing entry loggers : 
[BufferedChannel{logId=438, logFile=/mnt/storage/current/1b6.log, 
ledgerIdAssigned=-1}].
   Dec 10 12:38:50 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:50.962 [db-storage-12-1] INFO  
org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry 
logger 438 to disk.
   Dec 10 12:38:52 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
12:38:52.636 [bookie-io-1-6] ERROR org.apache.bookkeeper.proto.BookieServer - 
Unable to allocate memory, exiting bookie
   Dec 10 12:38:52 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: 
java.lang.OutOfMemoryError: Direct buffer memory
   Dec 10 12:38:52 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: at 
java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?]
   Dec 10 12:38:52 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: at 
java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?]
   Dec 10 12:38:52 ip-10-0-0-187.us-west-2.compute.internal pulsar[13379]: at 
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?]
   ```
   By default, we use 6GB of `DirectMemory` in the JVM options. So, in a 
subsequent experiment, we increased that number up to 24GB, but we observe the 
same problem again:
   ```
   Dec 10 15:42:31 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: 
15:42:31.327 [db-storage-12-1] INFO  org.apache.bookkeeper.bookie.Journal - 
garbage collected journal 1764d280e03.txn
   Dec 10 15:42:31 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: 
15:42:31.355 [BookieWriteThreadPool-OrderedExecutor-5-0] INFO  
org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage - Write cache is full, 
triggering flush
   Dec 10 15:42:33 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: 
15:42:33.533 [db-storage-12-1] INFO  org.apache.bookkeeper.bookie.Journal - 
garbage collected journal 1764d280de6.txn
   Dec 10 15:42:34 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: 
15:42:34.019 [db-storage-cleanup-13-1] INFO  
org.apache.bookkeeper.bookie.storage.ldb.EntryLocationIndex - Deleted indexes 
for 18879174 entries from 28 ledgers in 13.935 seconds
   Dec 10 15:42:42 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: 
15:42:42.674 [bookie-io-1-17] ERROR org.apache.bookkeeper.proto.BookieServer - 
Unable to allocate memory, exiting bookie
   Dec 10 15:42:42 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: 
java.lang.OutOfMemoryError: Direct buffer memory
   Dec 10 15:42:42 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: at 
java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?]
   Dec 10 15:42:42 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: at 
java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?]
   Dec 10 15:42:42 ip-10-0-0-47.us-west-2.compute.internal pulsar[13375]: at 
java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?]
   ```
   
   Specially, in the second reproduction, we can observe that the logs are 
stalled from `15:42:34.019` to `15:42:42.674`, when the `OutOfMemoryError` 
issue appears (+8 seconds). Just in the line before, we see that there is 
activity related to garbage collection related to the `DbLedgerStorage`. Before 
we reach this point, the benchmark suite has already executed multiple 
experiments with hundreds or thousands of Pulsar partitions. As each topic is 
deleted after a 5-minute workload, could it be possible that there is some 
issue related to the memory resources required by the deletion of 
`DbLedgerStorage` when thousands of ledgers are elegible for collection? If 
this induces the Bookie to be too busy, and the benchmark continues to write 
and read data, then it could be the case that direct memory is exhausted (e.g., 
Netty allocating buffers of writes).
   
   
   ***To Reproduce***
   
   Deployed on AWS using 10 Worker VMs (half for producers and half for 
readers) for OpenMessaging Benchmark (c5.4xlarge) and 3 VMs (i3.16xlarge) 
jointly running Pulsar Broker/Bookies. The instructions for deployment can be 
found here: https://github.com/pravega/openmessaging-benchmark
   
   The logs for the experiments can be found here:
   
[experiment-1-6GB-direct-memory.zip](https://github.com/apache/bookkeeper/files/5710571/experiment-1-6GB-direct-memory.zip)
   
[experiment-2-24GB-direct-memory.zip](https://github.com/apache/bookkeeper/files/5710570/experiment-2-24GB-direct-memory.zip)
   
   ***Expected behavior***
   
   In our experiments, we have verified that Bookkeeper is able to write faster 
than 250MBps in that environment (+400MBps), so we do not consider that the 
problem is due to the system being pushed too hard ([see this blog post as a 
reference](https://blog.pravega.io/2020/10/01/when-speeding-makes-sense-fast-consistent-durable-and-scalable-streaming-data-with-pravega/)).
 Actually, it is working fine for a long time (30-45 mins). However, the fact 
of using `ackQuorumSize=2` may make it visible (via an `OutOfMemory` error) 
that the Bookie gets stuck sometimes, perhaps while garbage collecting data 
from thousands of ledgers using `DBLedgerStorage`. If this hypothesis is 
correct, perhaps the expected behavior is that garbage collection would not 
induce this kind of problems under moderate/high loads.
   
   ***Screenshots***
   
   n/a
   
   ***Additional context***
   
   I will try the same experiment with `ackQuorumSize=3` and will update this 
issue with the results. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to