[GitHub] [bookkeeper] merlimat opened a new issue, #3559: Improvements for handling many small entries

GitBox Wed, 19 Oct 2022 23:20:11 -0700


merlimat opened a new issue, #3559:
URL: https://github.com/apache/bookkeeper/issues/3559


   The BK data path is very efficient when processing large entries and it's 
generally able to saturate the disk and network IO in these cases.
   
   By contrast, when handling a large number of very small entries there are 
several inefficiencies that cause the CPU to become the bottleneck, because of 
the per-entry overhead. 
   
   There are several low-hanging fruits to tackle to improve performance: 
   
   #### Reduce contention between message passing
   
   Reduce contention in journal & force-write queues: 
   - [ ] #3544 
   - [x] #3545
   
   Improve the OrderedExecutor performance: 
    - [ ] #3546 
       
   #### Reduce the number of buffers allocated per entry written/read.
   
   For each entry being written in a ledger we are using 4 `ByteBuf` instances: 
    1. The entry payload (this gets passed in to BK client)
    2. The checksum
    3. The serialized `AddRequest
    4. The 4 byte size header
   
   These buffers are passed to Netty which will do a scatter `writev`, though 
it will pass all the buffers. 
   Allocating and managing all these buffer is expensive. There is overhead in: 
    * Refcounting
    * Recycler to get the `ByteBuf` instances and put them back in the pool
    * ByteBuf pool arena to handle allocations/deallocation
    * Inter-thread synchronization: these buffer are normally allocated in one 
thread and deallocated from a different thread
   
   To make matters worse, while the checksum is computed only once, the 
`AddRequest` is serialized each time we write it on a connection. 
   eg: if we have write-quorum=3, it would mean we are using (2 * 3) + 1 = 7 
`ByteBuf` per each entry.
   
   Finally, while for big entries is very important to avoid copying the 
payload, for small entries the overhead of maintaining the `ByteBufList` is 
greater than just copying the payload into a single buffer.
   For that we should do: 
    1. If the entries are big -> keep using `ByteBufList`, with 1 buffer for 
all the header and the 2nd buffer referencing the payload, with no copy.
    2. If entry is small -> allocate a buffer to contain all the headers and 
the payload and copy into it.
   
   Pending changes:
    - [ ] Add the 4 bytes frame size header when serializing the request, 
instead of relying on a separate Netty filter
    - [ ] Consolidate buffer for small entries on read-response
    - [ ] Serialize only once and consolidate small entries for add requests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [bookkeeper] merlimat opened a new issue, #3559: Improvements for handling many small entries

Reply via email to