[freenet-dev] Load management proposal

Matthew Toseland Mon, 18 Jun 2007 16:40:16 +0100

MRogers and Anonymous have been arguing that we should limit load only on the 
immediate bandwidth level. Some of the thread is below, and my response 
including a proposal for a bandwidth limiting mechanism:

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.14 - 21:32:43GMT
-----

No, what I've been arguing is that we don't need to *explicitly* take the
lifetime overhead into account, because the feedback loop should be slow
enough that it will *implicitly* take it into account anyway. If the feedback
loop takes a few minutes to adapt (which it must if we don't want to
over-adapt to short-term conditions) then we're implicitly measuring the
lifetime overhead of a search, and we don't need to make an additional
explicit estimate of the lifetime overhead.

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.15 - 17:12:52GMT -----

What makes you think it would work at all? It seems to me that it would simply
oscillate - we accept too many requests, most of them time out, the we accept
too many requests again, and most of them timeout again. KISS is one thing,
but not slitting your own throat with occam's razor is another!

----- mrogers at UU62+3E1vKT1k+7fR0Gx7ZN2IB0 ----- 2007.06.15 - 21:44:34GMT
-----

:) If the feedback loop is slow enough it shouldn't oscillate - that applies
equally to bandwidth liability estimation, token passing or any other
mechanism - we musn't get into the situation of saying "I accepted 100
requests in the last 10 seconds and nothing bad has happened yet, therefore I
can accept another 100". As far as I can see, the only way to avoid that
situation is by adapting very slowly, ie on the order of the longest timeout.
Luckily we expect the connections to be relatively long-lived (minutes or
hours) so we can afford to take a few minutes to get up to speed.

Assuming that we have a suitably slow feedback loop in place, the next
question is how to tell our peers when we're ready to accept another search.
We could use tokens, preemptive rejection or blocking sockets. I don't have
any religious opinions on this issue, but I thought Anonymous made a fairly
good case for handling it in the same way as any other network app: don't
read from the socket until you're ready to process more data. I realise
there's a highly variable relationship between bytes in and bytes out, but
regardless of which mechanism we use we'll have to rely on the slow feedback
loop to smooth that out, because so far nobody's suggested a way of dealing
with it that doesn't involve favouring some kinds of traffic over others. (I
hope we agree that we don't want the whole network to end up favouring SSK
traffic over CHK traffic just because it happens to come in smaller quanta.
Maybe there are reasons for giving SSK traffic priority, but that isn't one
of them.)

----- toad at zceUWxlSaHLmvEMnbr4RHnVfehA ----- 2007.06.18 - 14:25:49GMT -----

It's not possible using the mechanism you propose *on its own*, while it *is*
interesting. Here's why: I have 48kB/sec output. With your proposed
mechanism, the fact that I have 1MB/sec input won't matter. So a leaf node
requests a splitfile through me, which is all in the network, but is not in
my datastore. I route his requests. It takes 5 seconds for enough of the
requests to start transferring to make a difference. Over that 5 seconds,
he's sent me 240kB of requests. That's around 240k/50 = 4,915 requests. Which
will yield 160MB of data. Unfortunately it will take me 160M/48k = nearly an
hour to transfer all the data.

This is not acceptable, because we want Freenet requests to complete in a
reasonable amount of time - if only for fproxy support, which IMHO is an
important application. And I don't like the idea of having different request
priorities either; it gives away information on the traffic and allows for
Bad Things.

Imposing an arbitrary limit on the number of requests running is not the
solution either. A limit may be imposed because of threads/memory usage, but
this is likely to be heavily architecture dependant. The solution IMHO is to
take into account likely future bandwidth usage. If we want to guarantee that
there are no timeouts, this means bandwidth liability limiting. The current
code will accept an SSK request only if it could also accept a CHK request,
and vice versa, so I don't see any reason to think it is excessively biased
in favour of CHKs. However if you like I will add code to collect the
probability of rejection for SSKs and CHKs individually.

For data blocks, clearly we cannot send one if there is no available space in
the congestion window to the peer in question. However we may want the data
for ourself, or for multiple peers, in which case the slowest peer should not
necessarily dictate the transfer rate; AFAICS we must accept the data as fast
as we can manage within memory/cpu/etc limits, and limit the incoming
bandwidth at a higher level.

So given that we must limit traffic on the request level (as well as the
congestion control level), how can we do this? We can either:

A) Wait for a request, and then either accept it or reject it, based on e.g.
how many requests are running already.
PRO:
- Simple
CON:
- Wasteful of time and bandwidth when we have to ask multiple peers before one
accepts
- Difficult to adapt to propagate load back to source

Queueing doesn't help much here because we still have to complete the
request - including queueing it - in a reasonable time.

B) Tell our peers when they can send us a request. The obvious way to do this
is to compute our overall quota of requests (or request-success-bytes), and
allocate it to our peers (and tell them on every packet/response/etc),
initially equally, but progressively allowing more to busier nodes and less
to nodes that don't use their quota (but with diminishing returns: a node
with a low quota that suddenly starts using more will take quota away from an
established heavy requestor). Thus, initially, if most nodes aren't busy we
won't run many requests, but as we identify those nodes which need a bigger
quota, more requests run overall. Note that running fewer requests than we
could isn't necessarily a problem anyway unless they complete really slowly.

How to avoid excessive misrouting?
Options:
1: We don't need to queue requests, as they are queued on the next node for us
(running requests can be regarded as queued requests). When we get a request,
we identify the set of nodes that we are willing to route it to, and if none
of them have any free request slots, we reject it.
2: Queueing requests helps, because we can match requests with nodes. When we
get a request, (if it's allowable by the node's current quota), we queue it.
When we have a slot in an outgoing node's quota, we send the request which is
closest to the node's location (which hasn't already been to that node),
regardless of the time it's been queued for (if multiple nodes have space, we
find the best request/node pair until they don't or we run out of requests).
If after a certain period a request hasn't been sent, we reject it. This
avoids excessive misrouting without requiring arbitrary parameters (as the
first option does), and sends more specialised requests to slower nodes.
3: Simulated queueing: remember the locations of recent requests we could have
sent to a node, and calculate a threshold, which decays over time if we don't
accept a request (obviously this will result in fewer requests being sent,
but lower latency).

Does this prevent flooding? Yes: we only accept, and offer, as many requests
as we can handle, and with the right fairness algorithms, a flood will be
limited to what the other peers don't need, although if our overall quota
calculation is too generous flooding may still be an effective attack.
Obviously on opennet, connecting to several hundred peers and flooding to
each of them will be fairly effective; opennet sucks, we all know this, it's
a transition phase.

Does this propagate load back to the originator? Not explicitly, as it's not
an originator-throttles scheme (such schemes don't prevent flooding and may
make it easier to identify the originator). However, the rate at which a node
can send requests is quite clearly limited by the rate at which the rest of
the network can handle them:

O - A - B - C

If B is a bottleneck, then O will only send requests at the rate that can be
funneled through it (or satisfied on A).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL:
<https://emu.freenetproject.org/pipermail/devl/attachments/20070618/1f587d75/attachment.pgp>

[freenet-dev] Load management proposal

Reply via email to