If I understood correctly, then currently, when a node queryrejects, the
QR goes back in line to the node that originated the request, and that
node will then send it to a different node. If this is true, then it has
the potential to royally mess up the network.


Assume a network where a significant amount of requests result in QR.
Every time a request hits a queryrejecting node, it gets bounced back to
the node that requested it. That node then resends the request to a
different node.


z--a--b--c--d
      |  |
      f  e


Node d is queryrejecting. All of A's requests will first travel through b
and c, be rejected by d, and travel through c and b back to a. Because a
didn't get the data it requested, it will repeat it's request to up to n
different nodes. N is potentially infinite, because, even if the node
itself limits the number of nodes tried, whatever software is originating
the queries (for example, FUQID) will simply keep retrying untill it gets
the data. Failure tables won't really help; FUQID is trying to download a
600 MB file, and by the time it retries whatever block failed, the time to
block has ended. In other words, finding the data only after several
queryrejects causes more load to the network than finding it without
qyeryrejects (because we bother more nodes if someone qyeryrejects).

The important thing to notice here is that passing the query causes load
in b and c, bringing them closer to queryrejecting themselves. This,
combined with the the above, means that the more queryrejecting nodes
there is, the more likely any given node is to queryreject. The end result
is a cascade failure: when the load rises high enough, some nodes start to
queryreject, which will cause more load to other nodes (localized mostly
near them in network topology; since b returned queryreject, a will next
try z), which causes some other nodes to fail, and so on.

Imagine, if you will, a sweeping wave of queryrejection going through the
entire network. Each node going to queryrejection mode will increse the
load on nearby nodes (not it's immediate neighbours, but on node z on the
other side of the requesting node), causing them to fail too. However,
once every node in a given area (in network topology) starts
queryrejecting, the requests will get stopped in their tracks, the load
will go down, and the nodes will stop queryrejecting. Then imagine several
such "query-rejection waves" interacting in a 50-dimensional network
topology (assuming that every node has fifty other nodes it routes
everything; actually, since some dimensions are one-way (someone can route
to us without being on our routing table) and other complications, saying
it's 50-dimensional is a huge oversimplification). The end result would
then be like in a crosswave; a node would fluctuate seemingly randomly
near the upper end of its maximum possible load, with sudden seemingly
random huge spikes of overload and load drops. And this behaviour will
only get worse as Freenet gets bigger (more starting points for waves,
more "wavelengths" possible, more complex multi-dimensional structure).

This is obviously bad, since it is *impossible* to predict nodes behaviour
in a crosswave, and making reliable prediction of routing times impossible
makes NGRouting useless. So, this behaviour has to be stopped. But how ?

One possible solution would be more tenious intermediate nodes. When d
queryrejects, c should try e next, and only after exhausting its
possibilities should it return QR. Similarly, b should try f before giving
up. Please note, however, that we probably want to limit this "elasticity"
to trying two or three nodes before giving up; otherwise there's a danger
that the query goes through the entire routing table and ends up being
routed to the worst possible node, instead of second or third best.

The other thing to do is to include a "time to live" to queryrejects; when
c gets the queryrejection from d, it should now know not to bother d again
for n seconds, freeing d from useless load. However, since c itself is not
overloaded, it should not pass any time to live if/when it returns the
queryreject to b.

Also, the appointment method discussed some time ago would probably be a
good idea.

Anyway, the key is to stop routing to queryrejecting nodes *immediately*
and *completely* for a time specified by them, to allow the network to
route around the waves, hopefully stopping the cascade reaction. This does
not create any new exploits, since the nodes asking for a little peace and
quiet aren't getting any extra power over others; they could simply
queryreject even without this extra ability.


Comments ?

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to