dlg99 opened a new pull request #1410: Issue #1409: Added server side 
backpressure (@bug W-3651831@)
URL: https://github.com/apache/bookkeeper/pull/1410
 
 
   Added server side backpressure handling and related unit tests.
   
   Background:
   
   BK’s writes happen in this order on the server side:
   First, ledger storage (Interleaved|Sorted), presumably non-blocking to some 
level.
   Second, journal. 
   Request is finished when journal’s write is fsynced.
   
   Three major moving parts on the server side need to be taken into account:
   - Journal; its performance (I/O delays) or journal’s batching delay 
(mis)configuration affect client latency. Journal has internal batching and 
separate thread for data fsync/request ack.
   - InterleavedLedgerStorage/entry log. It will naturally block request before 
it reaches journal if blocked on I/O.
   - SortedLedgerStorage. Sorted storage puts request into in-memory data 
structure/SkipList (aka memtable) until it reaches certain limit and flushes it 
to disk asynchronously.   
   
   Implementation:
   
   1. Limit number of requests in progress (separately for reads and writes 
since they are handled in different thread pools).
   Requests in progress (RIPs) are requests being processed by threads in 
threadpool + requests waiting for the next thread. RIPs lifetime if from the 
moment it is received/read by netty to the moment response for the request is 
sent. 
   Target limit of RIPs (heuristics) is ((number of processing 
threads)*2+(expected max batch size on journal)), so each thread can have one 
request to process and the next one waiting. 
   It is assumed that we have enough memory to keep data for that many 
requests. It is impossible to estimate size of read request at the moment when 
it is received anyway.
   
   Limit is configured by setting number of RIPs explicitly in config for the 
following reasons:
   Easier to experiment with different numbers. I.e. we may want to experiment 
with different number of requests in progress, i.e. ((number of processing 
threads)+2*(expected max batch size on journal)) or simply (2*(expected max 
batch size on journal)).
   There is an option to run request directly on netty thread so no config 
parameter to base initial value on and netty’s defaults can change between 
versions. 
   Removes need for explicit enable/disable backpressure flag, instead we can 
set RIPs to zero. 
   
   2. Pause netty’s autoread when limit is exceeded to prevent it from pulling 
more data before we track it as RIP. 
   
   3. Limit number of requests in asynchronous write path (LedgerStorage)
   
   InterleavedLedgerStorage will naturally block if write is slowed down due to 
i.e. fsync. 
   SortedLedgerStorage has naive implementation of throttling that blocks 
request for 1ms if checkpoint (memtable flush) is in progress. This is replaced 
with block until space in memtable is available. The limit is set to 
2*(skipListSize) where skipListSize is limit that triggers memtable flush. 
    
    4. Sending response must respect netty’s isWritable() flag and wait up to 
certain timeout, if needed. Drop response after timeout (client will not hear 
about that request) or close the channel (disconnect will notify the client 
that responses to requests from that connection will never happen).
   
   Master Issue: #1409 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to