I have an answer to why, but not a solution to how to deal with it.

THE PROBLEM:

I am fairly sure that the reason for this is as follows:
Realtime and bulk requests are treated separately for load management.
We can accept up to half of our requests capacity from one peer. This is on the 
basis that a burst is acceptable if there is no other traffic.
Realtime requests tend to be bursty, so at first glance this makes sense.
Now the problem: WE SHARE BANDWIDTH FAIRLY BETWEEN OUR PEERS (in the packet 
sender loop), REGARDLESS OF THE PRIORITY OF THE QUEUED MESSAGES!

Thus we can accept a load of requests - an fproxy burst, which is perfectly 
normal - thinking we can allocate half our bandwidth to one peer. (The output 
bandwidth liability roughly matches the 5 seconds per block criterion, although 
it's a tight fit if there is bulk traffic happening as well). Then we share our 
bandwidth equally between peers, and assuming that the other peers have some 
traffic, this means we have a lot less bandwidth than we thought available, and 
so the abnormally large number of transfers to the single node fail.

This is of course only a problem while it is a burst - after the first few hops 
it's pretty reliable. Which fits with the stats: If you do a lot of fproxy 
requests, you get a lot of failures, and the stats show this as low success 
rate; but if you just run a node, the success rate even for realtime block 
transfers is pretty good.

Unfortunately we need to have a lot of transfers in flight because of our 
relatively low success ratio - especially for bulk transfers. Although it's not 
that bad now ... Plus we gain security and bandwidth efficiency from always 
having a transfer in flight...

Unfortunately, apart from increasing the inter-block timeout to 10 seconds, it 
is not immediately clear what the solution is...

THE SOLUTION:

EASYISH STEPS:

First, I have increased the block timeout on realtime to 10 seconds. This may 
help somewhat in practice, as they are often only just 5 seconds; and the 60 
second liability calculation is cutting it rather close for 5 seconds per 
block. However, it would be better if we could solve the underlying problem...

Reducing the message size within a data block is a possible compromise solution.

TOO MANY TRANSFERS?:

Hmmm, is the real issue simply accepting too many transfers? The number of 
realtime transfers we can accept is:
(bandwidth in kilobytes) * 60 / 32
Which is approximately (bandwidth in kilobytes) * 2.
The number of realtime transfers a single node can have is therefore 
approximately (bandwidth in kilobytes). This is the bandwidth usage after 
taking into account that some bandwidth will be used for other things than 
transfers - but it doesn't count bulk transfers; both realtime and bulk make 
their own calculations based on the total.
If a peer is using half our bandwidth and has (bandwidth in kilobytes) 
transfers, and is doing no bulk transfers, we should expect a typical packet 
interval of 2 seconds. This could build up over hops, but bear in mind that 
it's much less bursty the further away from the originator we go. If the peer 
has bulk transfers using half its bandwidth, we should expect a typical packet 
interval of 4 seconds. If we are sharing fairly between all peers, we should 
expect a typical packet interval of:
(number of transfers = bandwidth) / (peer bandwidth = bandwidth / number of 
peers)
Which is in seconds. So for 40 peers, 40 seconds. :| For bulk transfers, the 
block timeout is 30 seconds, but we accept twice as many transfers. However we 
are much less likely to get a big burst to a single node with bulk transfers.
We could halve the total transfers limit to take into account the fact that 
there are both bulk and realtime transfers. We could then only allow one peer 
1/4 of the total rather than 1/2, halving it again. This brings us into 
plausible worst case timeout territory - 10 seconds for 40 peers on realtime, 
20 seconds for 40 peers on bulk. However the first step would be disruptive, 
and might result in significantly reduced average throughput. Arguably the 
problem is at the peer level, not at the total transfers level. Also, we should 
separate it from the total number of peers.
So the new proposal is we allow one node to use bandwidth equivalent to 8 
nodes' guaranteed bandwidth, i.e. 4 nodes' full fair share, when calculating 
how many transfers (requests) it can have at once. This gives:
(number of transfers = bandwidth * 4 / number of peers) / (peer bandwidth = 
bandwidth / number of peers)
Which is 2.5 seconds for realtime. There is little point in using a different 
formula for bulk because bursts are rare on bulk.
Obviously we would have to ensure that this is lower than half the total if we 
have fewer than 8 peers.

Is this consistent with what's been observed?

Yes. The nodes that got timeouts on testnet had timeouts around 5 seconds - 
just over the limit. They also had very few peers, maybe 10 or so. This is not 
as high as it should be according to the formula above but it's in roughly the 
right region.

Okay, I have implemented this. I'm pretty sure it will work ...

OTHER IDEAS: Making PacketSender not fair between peers is HARD!

I don't want to eliminate bulk vs realtime. Especially with new load 
management, it makes a lot of sense to have the separation. For downloads we 
want a lot of throughput. For fproxy we want low latency. We generally can't 
have both so it makes sense to be able to choose one or the other. Especially 
as most of our traffic is bulk, and there are lots of interesting things we can 
do with bulk in terms of performance and security.

We don't want a peer with a retransmission problem to occupy all our bandwidth.
We don't want to only send to the peer with realtime messages queued, and lose 
the other peers.

Really all we need to do is give the peer 1/4 of our bandwidth: Half is for 
bulk, half is for realtime, and half of the latter is for this peer. Of course 
if we have 40 peers, this is still equivalent to 10 of our other peers ...

This might or might not relate to some long term ideas about bloom filter 
sharing and preemptive datastore transfer, which would need an "idle bandwidth" 
mechanism.

Another complication is if we choose a peer because it has high priority 
messages (such as requests), we will currently send a full sized packet, which 
might include low priority data; this is simple and efficient in terms of 
payload, but it may cause problems for another peer sending realtime data...


See bug:
https://bugs.freenetproject.org/view.php?id=4731
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110318/3c5d5e10/attachment.pgp>

Reply via email to