sodonnel commented on PR #6613:
URL: https://github.com/apache/ozone/pull/6613#issuecomment-3403313238

   > While this will reduce memory usage, it may slow down seeking performance. 
Is this the intended trade-off?
   
   While only the current buffer and one pending buffer are stored in the 
reader, there is scope to configure the size of that queue. However what I 
found in practice is that GRPC seems to buffer about 1MB of data on the 
connection socket, so that when the item is consumed from the queue it 
immediately gets refilled.
   
   Compared to the existing implementation in Ozone with 16KB checksums, the 
time to read 1GB dropped from about 30 seconds to 7 seconds in local tests. 1MB 
checksums went from 4.5s to 2.6s.
   
   Seeks are another matter. I have not attempted to benchmark them as yet, but 
they would tend to be slow with any approach, as we can only cache a reasonable 
amount of data in the client - say at most 4MB, so there is every chance a seek 
takes you outside of that.
   
   The approach I have implemented simplifies the code quite a bit I think, and 
it reads the entire block in a single RPC call and only opens the block file on 
the server one time.
   
   I think it will be much better than the current code in Ozone and it can of 
course be refined further once we get the first version committed and more 
extensive testing done.
   
   I still have a bit more work to do on this PR around testing and retrying 
reads on failed DNs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to