ivankelly opened a new pull request #5356: [TIEREDSTORAGE] Only seek when 
reading unexpected entry
URL: https://github.com/apache/pulsar/pull/5356
 
 
   The normal pattern from reading from an offloaded ledger is that the
   reader will read the ledger sequentially from start to end. This means
   that once a user reads an entry, we should expect that the next entry
   they read will be the next entry in the ledger.
   
   The initial implementation of the BlobStoreBackedReadHandleImpl (and
   the S3 variant that preceeded it) didn't take this into
   account. Instead it did a lookup in the index each time, to find the
   block that contained the entry, and then read forward in the block
   until it found the entry requested. This is fine for the first few
   entries in the block, not so much for the last.
   
   This PR changes the read behaviour to only seek if entryId read
   from the block is either:
   - greater than the entry we were expecting to read, in which case we
     need to seek backwards in the block.
   - less than the entry expected, but also belonging to a different
     block to the expected entry, in which case we need to seek to the
     correct block.
   
   This change improves read performance significantly. Adhoc benchmarks
   shows that we can read from offloaded topics at ~160MB/s whereas
   previously we could only manage <10MB/s.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to