zymap commented on PR #4044:
URL: https://github.com/apache/bookkeeper/pull/4044#issuecomment-1669269780

   I was tested with this code to verify the batch impaction on memory.
   https://gist.github.com/zymap/19249ab35bb0f64c55cbf7f2e8356cb3
   I found the memory keeps increasing with the batch size. And if the batch is 
not flushed into the sst, it will save into the WAL file, and the WAL file will 
not be limited by `max_total_wal_size`. 
   If it was OOM killed because of the large batch and the batch was saved in 
the WAL. The only way to reopen the rocksDB is to add more memory for the 
bookie.
   I also talk this issue with rocksDB community, they said:
   >when the batch size is so large (esp if you run multiple batches together) 
the wal size may reach (limit + batch-size * number of open batches). We have a 
project opened by our friends from Kafka streams to handle huge batch size. In 
the meanwhile can you restrict the size of your batch ?
   
   --
   In the Pulsar, the compacted ledger hasn't a rollover policy or retention 
policy. If the user has tons of keys in the compaction, that would make the 
compacted ledger bigger and bigger. In our environment, a compacted ledger 
reached 200G. It contains lots of entries in a single ledger, which makes the 
batch very large.
   In release 4.14.7 and branch-4.15, we didn't limit the delete operation 
numbers in a single batch. 
   
https://github.com/apache/bookkeeper/blob/c22136b03489db1643521f586e9cae2c4a511e10/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/EntryLocationIndex.java#L241
   Finally, when bookie runs the garbage collection and removes the ledger, it 
will be OOM killed because of the large batch.
   
   Pulsar already has a proposal about configuring the compacted topic ledger 
retention, https://github.com/apache/pulsar/issues/19665. 
   But I think we also need to have a way to control the batch size to make 
sure we have a way to limit the memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to