Jusa Saari wrote:
If I understood correctly, then currently, when a node queryrejects, the
QR goes back in line to the node that originated the request, and that
node will then send it to a different node. If this is true, then it has
the potential to royally mess up the network.

- no, you are confusing DataNotFound behavior with QueryReject - yes, what is happening is screwing the network royally

when request traffic begins to route properly, most nodes should
see a fraction of their 'normal' queries per hour. This is
because the number of retries at a given node MULTIPLIES the
rate of those queries. So if I saw 30,000 queries/hour last
month, and we squoosh this QR bugaboo, I should see something like
5000 to 15000 queries/hour as a result. In fact I saw that
yesterday, with literally NO rejections for several hours.


z--a--b--c--d
      |  |
      f  e

Node d is queryrejecting. All of A's requests will first travel through b
and c, be rejected by d, and travel through c and b back to a. Because a

no, when d rejects, c will next try e . At each step, nodes will by default retry up to _40_ of their peers before they give up and send back RouteNotFound.

In the worst possible case, a query would visit 800 nodes. Something
around 100 is probably more likely. If things were working as designed,
less than 30 nodes would be used.

The important thing to notice here is that passing the query causes load
in b and c, bringing them closer to queryrejecting themselves. This,
combined with the the above, means that the more queryrejecting nodes
there is, the more likely any given node is to queryreject. The end result
is a cascade failure: when the load rises high enough, some nodes start to
queryreject, which will cause more load to other nodes (localized mostly
near them in network topology; since b returned queryreject, a will next
try z), which causes some other nodes to fail, and so on.

this is generally the problem so far as we can tell...


Imagine, if you will, a sweeping wave of queryrejection going through the
entire network. Each node going to queryrejection mode will increse the

unfortunately, we really don't need to imagine it :(


One possible solution would be more tenious intermediate nodes. When d
queryrejects, c should try e next, and only after exhausting its

this is how it is working today.


possibilities should it return QR. Similarly, b should try f before giving
up. Please note, however, that we probably want to limit this "elasticity"
to trying two or three nodes before giving up; otherwise there's a danger
that the query goes through the entire routing table and ends up being
routed to the worst possible node, instead of second or third best.

this is controlled by the maxRoutingSteps setting in config file, and I agree that a number much lower than 40 is appropriate.

The other thing to do is to include a "time to live" to queryrejects; when
c gets the queryrejection from d, it should now know not to bother d again
for n seconds, freeing d from useless load. However, since c itself is not
overloaded, it should not pass any time to live if/when it returns the
queryreject to b.

Also, the appointment method discussed some time ago would probably be a
good idea.

I think there is a punishment that needs to be imposed on a node if they don't back off. The punishment would be to DROP their requests for a while. Give them the first QR explicitly, but then just ignore the rest for a while. And we could use a QR msg to specify the rest of the command ("holdup for 3 seconds," "SLOW WAY DOWN,MAN") We believe that not responding to requests would be handled properly by the requestor, with timeouts on that end.

Anyway, the key is to stop routing to queryrejecting nodes *immediately*
and *completely* for a time specified by them, to allow the network to

not necessarily "completely" but definitely at a reduced rate.


Comments ?

Just glad to see someone else describing what I believe was/is a major problem, and fortunately it IS being addressed by developers.

ken

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to