We have two conflicting goals: 1. We want quick responses, especially for realtime requests, but in general, at the layer of requests, inserts, sending messages etc.
2. We want TCP-like behaviour that is tolerant of packet loss, and works with firewalls. It turns out that in some common cases these are mutually incompatible, pointing in opposite directions. There are a fair number of places where we use a 10 second timeout. For instance, when we send an Accepted to a CHK inserter, it should reply within 10 seconds with a DataInsert. The problem is: - The RFCs for TCP specify that retransmit timeouts start at ~ 2 seconds and increase up to ~ 120 seconds. Note that before we implemented this (RFC 2988) we had massive amounts of retransmission. - We send keepalives every 7-14 seconds (was 14-28 seconds) and timeout a connection if we receive no packets in 35 seconds (was 60 seconds). Network congestion and temporary glitches can cause severe latency. Severe latency is incompatible with quick responses. For instance, my connection seems to get 15 second pauses during which no packets are received from a given peer; this could be a problem with my ISP or my router, but if I get it it's likely a lot of other people get it too. We need to deal with this properly, but how? Most places we now implement a two stage timeout. If we get a response within a short period then we proceed; if we don't, we wait (off-thread, while the main thread reroutes) for a longer period. If we still don't get a response something is seriously wrong so we disconnect. Lately problems seem to mostly come up in the few places that don't do this so arguably the default strategy is to implement two stage timeout in those places. But it's a bit of a mess, maybe there is a better solution? Another option would be to have a much shorter connection timeout (which would require we send keepalives much more often), or a much longer timeout in many of these cases. Thoughts? Specifically asking for input from nextgens because of his experience with low level networking in general and Ian because of his experience with Dijjer, and because he originally pushed for short timeouts. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20110302/300778a5/attachment.pgp>