A user on Frost found despite the recent fixes that he still got a problem 
resembling the high-bandwidth-usage-way-over-the-limit-no-payload bug. He had 
the problem only when a specific peer was connected. He sent me logs. I have 
analysed the logs and it appears that:
- While the global bandwidth limit is the dominant factor, the AIMD maximum 
window increases indefinitely.
- Once the external bandwidth pressure (from other peers) abates, we are 
sending packets at a much faster rate than the link can handle.
- If the link then goes down, or has gaps, e.g. because it's over wifi, or 
severe latency, we may have major problems: we continue to send packets far 
too quickly, and when they are not acknowledged, although we will slow down 
the addition of new packets to send, we may have a lot of packets queued by 
then, and we will have to retransmit them.

Long term causes of this are the fact that our link layer code is not all that 
TCP-like and needs to be rewritten; it does congestion avoidance at a higher 
level than it needs to, for example, the congestion window is done on a flow 
rates basis rather than an actual window... 

The interaction between the bandwidth limiter and the congestion avoidance 
algorithm is however interesting in itself. For TCP, bandwidth limiting can 
be done externally (by dropping packets that go over the limit), or 
internally (by limiting the flow of data before it reaches TCP). It is not 
clear that TCP (RFC2581) adequately handles the latter case. There is a note 
in section 4.1 to the effect that if a connection is idle for a long period 
and then restarted you can get a burst (so reset to slow-start if the 
connection has been idle for a while), but it appears to me that the same is 
possible if you have a trickle of data for a long period and then a burst: 
TCP will send it at something approaching the maximum possible speed, and get 
into trouble. IMHO the solution for us is to not increase the congestion 
window size unless we have actually had a full window in flight recently. I'm 
not sure how this would fit into the NewTransportLayer...

The short term solution appears to be the following:
- Have an explicit window for each PacketThrottle: be able to count the number 
of packets in flight to a specific peer.
- Enforce the window, rather than enforcing the rate.
- Don't increase the window size (in congestion control) unless we have 
actually used the full window - had the full window in flight - within the 
last round trip time.

The new transport layer would have an explicit window, and would do bandwidth 
limiting at a much lower level, including retransmits, have a much larger 
maximum retransmission window, and generally be much closer to TCP and work 
better on high latency connections - but without the last provision above, it 
could still run into this kind of problem.

New transport layer:
http://wiki.freenetproject.org/NewTransportLayer
http://wiki.freenetproject.org/NewPacketFormat

Thoughts? The short-term fix should be reasonably easy. The new transport 
layer otoh could take some time.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20071208/eb0a017e/attachment.pgp>

Reply via email to