Re: [freenet-dev] Getting rid of load limiting

Robert Hailey Mon, 29 Mar 2010 12:40:25 -0700


On Mar 24, 2010, at 2:24 PM, Matthew Toseland wrote:

Currently our load management system works like this:
- At the level of any node, a request will be preferentially routedto the best node by location (or FOAF location), but if many nodesare backed off, we can route to the worst node by location. In otherwords there is no limit whatever on misrouting.- At the level of the node originating requests, we attempt toestimate the capacity of the network in a manner similar to TCP(although more vague as we operate on rates rather than actualwindows).
Hence we do not limit misrouting, but we do limit load. And limitingload on the sender side is very expensive and inflexible, costing usa great deal of performance.

True, yet from a pragmatic perspective... is there really a way aroundthis? Network theory: you can control what you send but not what youreceive.

Recently the CCN paper, and since long ago various people on Frost,have pointed out that we do not in fact need to limit load. We don'tneed to limit it at the sender side and arguably we don't need to domore at the any-node level than basic limiting of the number ofrequests in flight. This is because every time data is transferredit is the result of a request, so congestion in the sense that TCP/IP deals with cannot really occur.

That is fascinating, I'll have to think about that. If the node wouldalways be busy I guess it could not be any worse than present.

Clearly we do need to limit the number of requests which are pendingon any given node. If we don't, we will end up sharing a finite andoften very small resource (the output bandwidth limit) among anunbounded and potentially very large number of requests.

And in order to limit the number of requests in-flight, we need tolimit the number of requests which the node accepts. Isn't thisalready the case?

Also it would be good to directly limit misrouting by refusing tosend requests to a node that is too far from the ideal for the keywe are requesting. However, some nodes will be always very slow, andsome nodes will be temporarily very slow. IMHO these are twodifferent issues: Nodes which are always very slow (e.g. due tobeing on dialup) should publish a very small capacity and be usedoccasionally when they can be used, and the fact that we can't usethe very slow node for most requests that would in theory match itshould not be a big concern with regards to misrouting. Whereasnodes which are temporarily very slow (e.g. temporarily sufferingunder massive CPU load) should just be ignored - they rejectrequests or time out, and we can backoff. Hence backoff should onlyhappen in critical cases (e.g. high ping time), and most of the timeload is limited by the published request limit, which takes overfrom output bandwidth liability preemptive rejection.

I think this is the true problem. Slow nodes.... heterogeneousbandwidth limitations... possibly some unrecognized (unmeasured?)machine-performance issues.

I would like to resubmit my previous suggestion: a "pre-request" whichcalls for a node to make a very-firm estimate asto the amount of timeit would take to deliver a given key. If a followup "actual" requestexceeds the estimate (or is remotely cancelled) the upstream node ispenalized (via backoff), thus temporarily removing it from a healthynetwork.

Plus, it closely mirrors *real-life*... your boss says "when can youget this [very important] report to me", and you can indicate "giventhe amount of other bulk & important requests, and knowing how fast Ican walk to your desk... 3 minutes".

For security we may need to turn a fraction of these request-estimatesinto actual (bulk) requests.

Even with these precautions we will need a heuristic for the degreeof misrouting we will tolerate. Options:- Always route to the ideal node, subject to the above limitations.Arguably this is best, provided we can exclude outliers in asatisfactory way.
- Allow routing to the top two or three nodes.
- Take a proportion of the routing table (e.g. top 25%).

In my idea if the node received a request (or a pre-request), we couldquery an arbitrary number of nodes and take the most favored one.

For bulk transfers we could take a measure of something other than thebottom-line time estimate, such as largest-%-capacity or largest-%-uptime of all links between the requestor and the datum.

[...snip...]

6. Realtime vs bulk flag
Add an extra flag to requests. If the flag is set to BULK, weproceed as before, with queueing and a relatively long timeout. Ifthe flag is set to REALTIME, we need an instant decision, even if itmeans an instant failure. So we try to route to the top nodes - aslightly more generous heuristic than with bulk - and if we can't,we fail as if we had timed out. Another way to implement this wouldbe to have a dedicated quota within the per-node limit for realtimerequests. A third option would be to allow realtime requests to bequeued but for a much shorter period, and use stats to estimatewhether that will be exceeded and if so fail immediately.


Another idea... three classifications: realtime, fast, and bulk.

Realtime requests always occur at the full link speed between nodesand have an embedded "deadline" timestamp (which could be increased atthe origin until it succeeds). A node instantly rejects such a requestif it would make any other transfers go over time-budget, otherwiseall other transfers are (very-temporarily) suspended for the [one]realtime transfer.

Hmmm... on second thought you would still need some kind of queuing ortoken system... or a realtime response could include the narrowestpipe; and a node could cancel any "slower" realtime requests.


--
Robert Hailey

_______________________________________________
Devl mailing list
[email protected]
http://osprey.vm.bytemark.co.uk/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] Getting rid of load limiting

Reply via email to