sodonnel commented on PR #6613: URL: https://github.com/apache/ozone/pull/6613#issuecomment-3403313238
> While this will reduce memory usage, it may slow down seeking performance. Is this the intended trade-off? While only the current buffer and one pending buffer are stored in the reader, there is scope to configure the size of that queue. However what I found in practice is that GRPC seems to buffer about 1MB of data on the connection socket, so that when the item is consumed from the queue it immediately gets refilled. Compared to the existing implementation in Ozone with 16KB checksums, the time to read 1GB dropped from about 30 seconds to 7 seconds in local tests. 1MB checksums went from 4.5s to 2.6s. Seeks are another matter. I have not attempted to benchmark them as yet, but they would tend to be slow with any approach, as we can only cache a reasonable amount of data in the client - say at most 4MB, so there is every chance a seek takes you outside of that. The approach I have implemented simplifies the code quite a bit I think, and it reads the entire block in a single RPC call and only opens the block file on the server one time. I think it will be much better than the current code in Ozone and it can of course be refined further once we get the first version committed and more extensive testing done. I still have a bit more work to do on this PR around testing and retrying reads on failed DNs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
