Ok, lets try it - but unless there is an obvious marked improvement, we must *remove* it (rather than just continuing to add complexity).

Ian.

On 10 Aug 2006, at 14:50, Matthew Toseland wrote:

On Thu, Aug 10, 2006 at 11:43:32AM -0700, Ian Clarke wrote:
Why use a median, rather than a mean?

I'm concerned about this, it increases (yet again) the complexity of  
our load balancing, it hasn't been simulated, and I think there will  
inevitably be unpredictable results.  For example, what happens if  
*all* nodes have an AcceptedFrac below the median?  This mechanism  
will just spiral down until all nodes are hardly accepting anything.

The current mechanism is spiraling down, and nodes are accepting hardly
anything. The median is defined as the middle value if you order the
values, so half the nodes will be above it and half below it. So we'll
always be able to send as many requests as we want to half the nodes.
The reason for the current lack of performance is overcompensation for
load, as evidenced by the low bandwidth usage. To be precise, it's
because backoff is used as a load limiting mechanism and not just a load
balancing mechanism.

We can *not* afford to fall back into the trap of adding more and  
more complexity to the point that nobody understands what is going  
on, especially in the area of load balancing which is one of the  
areas that is most difficult to get our heads around.

The theory is simple: We have propagation of overload back to the
original senders. They will slow down if they get lots of timeouts and
forwarded RejectedOverload's. That's load *limiting*. The job of load
*balancing* is to prevent all our traffic being routed to slow nodes
which then generate lots of RejectedOverload's and slow down the whole
network. We could take out load balancing completely, but that might
result in a significant slowdown due to slow nodes. What's even better
is that this scheme has fewer arbitrary constants than backoff.

I really feel that we lack a "theory" of load balancing, a simple  
conceptual model through which we can think about the problem and  
find a solution.  Rather, we come up with idea after idea based on  
vague and invariably flawed notions and assumptions, layering one on  
top of the other until nobody understands what is happening.

Load limiting: Sender side, determining how fast to send requests into
the network. If there are too many RejectedOverload's, the sender slows
down. This will ensure the network is not overloaded, full stop. It
works for TCP/IP, it should work for us. 

Load limiting without load balancing: If there are slow nodes near the
sender, and we send these an equal proportion of our incoming requests
(according to their location), then most of those requests will be
rejected, and this results in an overall slowdown.

Load balancing: Routing side. The purpose is to determine which nodes to
send requests to, in order that a) our requests are reasonably fast, and
b) we don't send too many requests to really slow nodes, causing an
overall slowdown via load limiting. Load balancing can legitimately be
"selfish" i.e. it can send a request even if all available nodes are
slow, because load LIMITING will reduce the overall send rate if the
overall network is slow; that's not load balancing's problem. Not being
"selfish" in this sense is a threat to anonymity (all nodes backed off
except one), usability (all nodes backed off), and performance, and is
not justified. TCP/IP runs over more than ethernet, and I don't think
ethernet collision detection is a viable model for backoff.

Conceptually this isn't a difficult model. And I don't think it could
possibly be any worse than current performance; right now we are plainly
overcompensating for load, the reason for this is backoff, which is a
means for both limiting and balancing load, when it should be solely a
balancing mechanism; we have a better limiting mechanism already.

We must place a moratorium on all but the most trivial "improvements"  
to load balancing unless they have been simulated.

That could be another month. Michael has rightly insisted on doing a
fairly low level simulation, which has resulted in us rearchitecting a
lot of the lower layers of Freenet; this is a good thing, but it takes
time. Token passing is a substantial change, which we should not deploy
without simulation. It should solve our current problems, but if it
takes another month for it to be ready, given the current situation of
massive overcompensation for load resulting in universally low bandwidth
usage, lots of backoff, and really slow inserts, it seems reasonable to
me to try the above. Actually I don't think that load limiting with no
backoff would do too badly, but it would be rather sensitive to modem
nodes etc.

Ian.

On 9 Aug 2006, at 07:30, Matthew Toseland wrote:

What is backoff for? It's not to prevent overloading nodes; nodes  
reject
requests pre-emptively. What it is for is this:

If we don't backoff slow nodes, then these slow nodes will have to
reject (or occasionally timeout) the requests that we do forward to
them. Those failures will be propagated back to the request  
senders, who
will have to reduce their send rate. In other words, if we don't avoid
sending requests to the slowest nodes, the requests we do send to them
will be rejected, and those rejections will cause a universal slowdown
in request rates.

Thus, I don't think that the current mechanism is sensible. Nodes  
can be
expected to take care of themselves, now that we have pre-emptive
rejection. I don't think the ethernet analogy works 100%; we don't
really have a shared medium.

What we are looking at here is simply a way to ensure that requests  
are
not forwarded to nodes who will inevitably reject them and thus  
cause an
overall slowdown at the requester end. This is completely different to
the case in 0.5, where requesters never slowed down.

So what am I proposing?

For each node, we track the proportion of requests which are accepted.
We calculate a median. For any node above the median, all requests are
forwarded to it: if most of our nodes are slow, that's not our  
problem,
it's our peers who want to forward requests through us who have to  
deal
with it (by not sending requests to us). For nodes below the  
median, we
artificially narrow their specializations:

While routing:

public double distance(PeerNode node, double target) {

double distance;
if(node.acceptedFrac >= medianAcceptedFrac) {
distance = rawDistance(node.location, target);
} else {
distance = rawDistance(node.location, target) *
medianAcceptedFrac / node.acceptedFrac;
}
return distance;
}


Now, to some degree, this is alchemy, in the sense that it hasn't been
simulated. However it will be some time before the new rate limiting
mechanism is available, and it appears a solid argument to me: If all
our peers are slow, that DOES NOT mean we shouldn't send requests to
them. Because we have load propagation, it is entirely legitimate to
send requests ANYWAY, and let upstream nodes deal with the slowness -
either the entire network is slow, in which case we reduce all the
send rates, or this particular node is slow, in which case we reduce
the number of requests sent to it! Note also that slavishly following
backoffs reduces our anonymity quite noticeably.

Comments?
-- 
Matthew J Toseland - [EMAIL PROTECTED]
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.

Ian Clarke: Co-Founder & Chief Scientist Revver, Inc.
phone: 323.871.2828 | personal blog - http://locut.us/blog




-- 
Matthew J Toseland - [EMAIL PROTECTED]
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.

Ian Clarke: Co-Founder & Chief Scientist Revver, Inc.
phone: 323.871.2828 | personal blog - http://locut.us/blog

Attachment: PGP.sig
Description: This is a digitally signed message part

_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to