Lots of people are complaining about the current state of the network, and 
rightly so. Most nodes have around 50% backed off peers, whereas a few months 
ago this was much lower. This results in misrouting, inserts disappearing, slow 
downloads etc. (It is possible there are client layer bugs as well, but IMHO 
the network problems are by far the higher priority). This situation has 
persisted for quite some time now. I had thought it was due to temporary 
disruption caused by the network expanding due first to the healing changes and 
secondly to German TV coverage, but it has gone on way beyond that. And there 
was a buggy load management change but that was fixed in 1277. The main backoff 
reason appears to be ForwardRejectedOverload i.e. too many requests are being 
started, and therefore many of them are being rejected, and this is resulting 
in backoff.

So what has changed between 1255 and 1277, that could conceivably impact on 
load management?
- Block transfer abort propagation. This cannot directly cause rejections.
- Autoconfig of thread limit by memory size. This could conceivably cause 
rejections, but if that was the case then a lot of nodes would show thread 
limit as the top reason for rejection under "Preemptive Rejection Reasons" on 
the stats page in advanced mode. AFAICS this is not happening.
- MIN_NON_OVERHEAD and a change to how the overhead fraction is calculated. The 
former wasn't active until quite recently, due to a bug, so I doubt that 
turning it off again would make any difference. Both are designed to avoid the 
situation where a node gets into a "death spiral" where it has a high overhead 
fraction for a while (e.g. due to a mandatory update), and then accepts very 
few requests because it has a high overhead fraction, resulting in it having 
low bandwidth usage and high overhead for a very long time.

The current load management system is reliant on the request *originator* 
reacting properly to the signals we send it, controlling its send rate via 
AIMDs (similar to TCP congestion control). IMHO the most likely explanation is 
that somebody is deliberately flooding the network by sending lots of requests 
without that rate control. This is a relatively cheap attack that I would 
expect to be highly effective, and it cannot be eliminated except by replacing 
the load management code. Which is why I have been working on the 
new-load-management branch for some time now.

Of course, sending this email virtually guarantees that somebody will if they 
haven't already. But we discussed it on the IRC channel some time back, and 
IMHO communications with the community are important, especially when the fix 
is likely to take quite some time and be rather difficult.

Deploying the "fairness between peers" code may help a bit. I may start to 
merge stuff from the new load management branch before it is finished and 
ready, as much of it is relevant even without the full framework.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Devl mailing list
[email protected]
http://freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to