On Saturday 02 Jul 2011 17:52:19 Ian Clarke wrote:
> On Sat, Jul 2, 2011 at 10:43 AM, Matthew Toseland <toad at 
> amphibian.dyndns.org
> > wrote:
> 
> > I don't think I fully understand load management, and I would appreciate
> > help from people who do! Unfortunately our theorists are mostly interested
> > in routing, and mostly tend to go away after a while anyway. It is also
> > possible that the problems on testnet are caused by simple bugs, I have
> > solved many but more must remain.
> 
> I think you are diving straight into the details of a solution, which makes
> it very difficult for anyone to really give you useful feedback.  We don't
> understand the goals, or how you arrived at this solution beginning from
> these goals.

You are probably right. I need to be clearer.
> 
> Consequently I think we need to take a step back and ask ourselves a few
> fundamental questions, in particular:
> 
> What would the perfect load-management system do?
> 
> In other words, what are our goals?
> 
> The challenge is that we have a scarce resource, which is a node's ability
> to process and forward requests and responses, and we wish to limit network
> load such that this scarce resource isn't exhausted, and is fairly allocated
> between requestors.
> 
> The extent to which this scarce resource is used, and where it is used, is
> determined by the rate at which  nodes spawn requests, the type of those
> requests, and how those requests are forwarded.

Your summary of our goals above is naive and leads to the current system, which 
as I will show is problematic at best.

Background (I don't expect to change these):
A. Each node has a limited capacity for running requests, and sometimes there 
are limits on the capacity between two nodes e.g. due to network congestion. A 
node will not accept more requests than it can safely handle, and it will share 
the available capacity between its peers fairly.
B. Sometimes when a node rejects a request, it is because of a serious problem, 
such as running out of threads, or having such a high network ping time that 
it's impossible to do anything. Most of the time it is a "soft" rejection, 
because we temporarily are over our available capacity; these soft rejects 
occur essentially randomly.
C. Requests can be flagged as either bulk (high latency, high throughput) or 
realtime (low latency, low throughput). Load management strategy may vary for 
bulk vs realtime.

Goals:
1. As much of the available capacity at each node as possible should be used, 
when there is demand.
2. There should be a minimum of network-level overhead, i.e. sending a request 
50 times before it is accepted is bad.
3. The node originating a request should not be treated any differently to the 
other nodes on the network. In particular, we should not be dependant on the 
originator "playing by the rules".
4. Routing to the wrong place is not an acceptable way to "manage" load, as it 
results in increased overall load (as it takes more hops to get to where we 
need to be). Worse, data may not be found at all if inserts are misrouted. On 
the other hand when nodes are severely overloaded, or have far less capacity 
than their peers, some amount of misrouting must be tolerated. So we must be 
able to specify when it is acceptable to misroute. 

The current load limiting system is based on the request originator using an 
AIMD and the round trip time of requests to decide how many requests to start, 
combined with exponential backoff from peers on *all* rejects (not just hard 
rejects). The consequences of this are:
1. We do not generally use all the available capacity, even in cases where data 
can be fetched locally. Making the AIMD a true limit on the number of requests 
in flight rather than a rate might help a bit here.
2. Network level overhead is fairly good, if you ignore #4.
3. The node originating the request has to follow the AIMD algorithm. A node 
which turns off AIMDs and just sends requests can enhance its success rate at 
the cost of causing backoff and therefore misrouting across the network. Also, 
there is a theoretical security issue with the request rate potentially 
identifying the originator, even at a distance (how serious this is is not 
clear, certainly at one hop it's irrelevant as there are much easier attacks).
4. Backoff happens as a normal consequence of the AIMDs learning the capacity 
of the network: If the senders are sending too many requests, backoff is high, 
and there is more misrouting. If the senders follow the rules, this is 
manageable, but apart from that DoS possibility, it causes poor data 
persistence.

New load management does the following:

Load is managed at a node-to-node level. Everyone is permitted and encouraged 
to send requests at whatever rate they want to, only concerned about their 
peers' capacities. We tell our peers what our capacity is, by giving them 
information on how many requests we can accept. Up to some threshold (a peer's 
fair share), our peers are guaranteed to have their requests accepted; beyond 
that, they might be accepted if the overall usage is below a threshold. We 
calculate which peers we could reasonably send a request to (thus limiting 
misrouting), and then queue requests until we can send them to one of those 
peers. Hence:
1. We will use most of the available capacity.
2. There is slightly more overhead because we can send the same request to a 
node twice. However this is limited.
3. Load is managed on a purely hop-by-hop basis.
4. Misrouting is explicitly limited.

The main problems with this are:
i) Queueing at each hop potentially increases latency significantly. This is 
not necessarily a big problem for bulk requests, especially if we can increase 
the node's capacity at the same time (an element of it is an arbitrary 
parameter - the time taken to transfer all the data if all the accepted 
requests succeeded). However, I have not been able to get a theoretical basis 
for how long to expect delays to be. In practice, delays can be quite long, 
with large spikes and averages between hundreds of milliseconds and tens of 
seconds, but don't seem to go to infinity.
ii) What happens when an island of nodes with massive capacity are connected to 
a larger group of nodes with low capacity? An ideal load management system 
would only start as many requests as can be accepted by the downstream network. 
New load management as above would start a lot of requests, most of which would 
then end up queued for a very long time on the edges. We have to have timeouts 
so this would cause the edge node to get backed off. If the island is large, 
the requests which are not relayed will bounce around inside the island until 
they DNF, and those keys will not be requested again for a long period, which 
is not ideal either. It's probably a viable first approximation though...

if the average queueing time for the remaining peer is above some threshold, 
send a message indicating to try again in some period.

The first seems not to be as serious a problem now as it had at first appeared. 
Further bugfixing and testing on the testnet should confirm that this isn't a 
big deal.

As for the second: 

This can and should be handled on a key level, because there may be different 
bottlenecks in different parts of the keyspace. What we need to do is limit 
misrouting further (e.g. don't try every peer just to see a RejectedLoop and 
then get a bogus DNF), and if we have been queued for more than some threshold 
time, and at least one of the peers we are waiting for is not low capacity / 
severely backed off, and has accepted requests in that period, then return a 
RecentlyFailed message, telling the downstream nodes (and the originator) not 
to try that key again for a period (perhaps equal to the threshold, perhaps 
also a function of the amount of over-booking); this can be less than the full 
30 minutes for a DNF.

Side-issue: Error handling: We don't want to do a DNF in the case where the 
queue length is too long. DNF = Data Not Found, i.e. we searched for it, went 
to where we should have gone to, and didn't find it. This is closer to an RNF: 
We are (possibly temporarily) unable to find enough nodes to search for the 
content. However, if we treat it as an RNF, it will simply try every node 
inside the high capacity island and then DNF because it has run out of HTL (if 
the island is big enough). Because it has DNF'ed, the key won't be tried again 
for a while (3 attempts every 30 minutes). 

Closely related side-issue: New load management's restrictions on misrouting 
only apply to the nodes we haven't routed to. Hence we can try every peer other 
than the low capacity exit node, just to get RejectedLoop's. Maybe this is a 
bad thing? On the other hand on small networks it may be very important...

The inspirations are:
- The Jacobson paper on CCN ("networking named content") points out that you 
don't need end to end / sender side load limiting as in TCP if all data is a 
response to a request. It does not however explain how to decide how many 
requests a sender should originate, and it relies on dropping packets and 
sender retrying after a timeout, not on queueing.
- Discussions on Frost and elsewhere over a period of years. There were various 
proposals, including token passing. There was a large and apparently 
knowledgeable group in favour of the absolute minimum of load management, which 
seemed to boil down to relying purely on hop by hop flow control. We don't have 
flow control because we buffer the whole data block on each hop, but IMHO new 
load management is close.
> 
> Perhaps if you can describe your proposal by starting with a goal, and then
> explaining how this proposal achieves this goal, it will help us all to
> think about it and understand what you are trying to achieve.
> 
> Ian.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20110708/bfb79c95/attachment.pgp>

Reply via email to