Hi, > My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the > bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is > that it determines the max size passed to WALRead(), which in turn determines > how much we read from the OS at once. If the storage has high latency but > also high throughput, and readahead is disabled or just not aggressive enough > after crossing segment boundaries, larger reads reduce the number of times > you're likely to be blocked waiting for read IO. > > Which is also why I think that making MAX_SEND_SIZE configurable is a really > poor proxy for improving the situation. > > We're imo much better off working on read_stream.[ch] support for reading WAL.
Well then that would be a consistent message at least, because earlier in [1] it was rejected to have prefetch the WAL segment but on the standby side, where the patch was only helping in configurations having readahead *disabled* for some reason. Now Majid stated that he uses "RBD" - Majid, any chance to specify what that RBD really is ? What's the tech? What fs? Any ioping or fio results? What's the blockdev --report /dev/XXX output ? (you stated "high" latency and "high" bandwidth , but it is relative, for me 15ms+ is high latency and >1000MB/s sequential, but it would help others in future if you could specify it by the exact numbers please). Maybe it's just a matter of enabling readahead (line in [1]) there and/or using a higher WAL segment during initdb. [1] - https://www.postgresql.org/message-id/flat/CADVKa1WsQMBYK_02_Ji%3DpbOFnms%2BCT7TV6qJxDdHsFCiC9V_hw%40mail.gmail.com