On Fri, Apr 21, 2006 at 10:09:00PM +0100, Michael Rogers wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Matthew Toseland wrote:
> > Is there a non-alchemical way to define such a queueing algorithm?
> 
> Yup, see my message from the following morning about token buckets. I
> guess from the flood of messages that you've just dealt with a week's
> worth of mail. :-)
> 
> > Target rate t
> > Rate from B b
> > Rate from F f
> > 
> > We know:
> > b + f > t (otherwise we don't need to drop any)
> 
> Exactly - when capacity exceeds traffic, give everyone as much as they
> ask for; enforcing fairness when there's excess bandwidth would be
> cutting off our nose to spite our face.
> 
> > If b >> f, then we can simply allow through all F's requests and reject
> > (b+f-t) requests from B.
> 
> This is what a token bucket achieves. Each neighbour's bucket is filled
> with tokens at the same rate. (The bucket is actually just implemented
> as a counter, where the 'size' of the bucket is the upper limit of the
> counter.) Each token allows the neighbour to send a certain number of
> bytes. When a neighbour's bucket is empty, its requests are rejected.
> 
> *But* to avoid wasting bandwidth by enforcing fairness when there's
> excess capacity, tokens from overflowing buckets spill over into the
> emptiest bucket. So the bandwidth is distributed according to demand
> when there's enough to go round, and fairly when there isn't. There's no
> queueing delay, and the buckets allow for bursty traffic (larger buckets
> allow larger bursts, enforcing fairness over a longer timescale).

Aaaaah... that's how you do it. COOL!
> 
> > And we don't need to track stats per-path.
> 
> Agreed. However, there should be an AIMD rate limit on each link, which

Yes.

> again can be implemented with a token bucket to allow bursts of traffic,
> giving two buckets per neighbour: one for incoming traffic (fairness)
> and one for outgoing traffic (rate control). 

Hmm. So when we want to send a request, we take a token from the
incoming bucket for the node which we received the request *from*, then
we take a token from the outgoing bucket for the node we would like to
send the request *to*. If we can acquire both, then we send it. If not,
we return any acquired tokens.

This is akin to us not routing to a node because it is backed off. So it
should be safe - but will it result in increasingly perverse routing
with load, as we saw in 0.5? Probably not significantly if load is
propagated properly... Obviously if it isn't the whole thing collapses.

We need to separate the input and output buckets:
- If we don't have an input token, we should immediately send an
  FNPRejectedOverload{local=true} to the originator and discard the
  request.
- If we don't have an output token for one node, should we try the
  next-best? At present, backed off nodes are ignored for routing...
  As I understand the above, the input token bucket is added to at a
  rate which is simply determined by the overall output capacity divided
  by the number of nodes. The output token bucket is added to at a rate
  determined by the AIMD for that node.
- So if we do a RejectedOverload whenever we can't get a token for the
  output bucket, either a) we have to kill the request on any
  RejectedOverload, or b) it will be retried on another node further
  from the target. I don't like b), and I'm not sure about a).
- If we simply move on to the next node (since backed off nodes are
  ignored for routing at present and that seems to work), then to some
  degree the load will go to where it is least overloaded...

AHA!

I know what the problem is with this scheme: Routing!

Not all nodes are equally desirable to routing. The total capacity of
the node is not the sum of the AIMD-derived capacities of each peer it
is connected to. It is a weighted average based on the _actual
locations_ of the nodes! Requests are randomly distributed across the
keyspace, but the locations of the peers connected to an individual node
will not be exactly spaced across the keyspace.

This doesn't matter if what we rely on is the output buckets, as seems
to be implied here. We can indeed reject requests whose preferred node
is not available, however, a very slow node at a good point in the
keyspace will then inflict great harm. And too many of these rejections
would be very bad as the requests would get routed by the previous node
which is presumably further away from the target. So it would seem we
need another mechanism: Backoff, perhaps. Alternatively, we could make
the preceding node reduce its request send rate to us _without killing
the request_. Then we could route it to the sub-optimal node, but at a
price. This should solve the problem of sub-optimal routing, but the
very-slow-peer-with-big-chunk-of-keyspace problem remains. Basically the
only way to solve that is to ignore the node - perhaps some form of
backoff, or explicit detection of very slow nodes. Ideally the location
swapping algorithm would take slow nodes into account, there was an idea
for this, but THAT is not something I'm willing to play with without
input from Oskar and a lot of simulations...

> Add sliding window flow
> control to the links and it should be possible to prevent overload
> rather than using ping time to detect it.

What do you mean by sliding window flow control? Doesn't AIMD suffice?
> 
> I'm planning to do some proper work on this after the p2p2006 deadline
> on the 1st of May... maybe Google will even pay me to work on it. ;-)

Maybe... IMHO if we can implement a form of load balancing that doesn't
have the problems associated with the current one then it would be
worthwhile for me to do it immediately; we just have to thrash out the
design... that is assuming the design is 100% bulletproof...
> 
> Cheers,
> Michael
-- 
Matthew J Toseland - [EMAIL PROTECTED]
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to