On Mon, Nov 10, 2003 at 03:54:18AM -0500, Ken Corson wrote:
> Jusa Saari wrote:
> >If I understood correctly, then currently, when a node queryrejects, the
> >QR goes back in line to the node that originated the request, and that
> >node will then send it to a different node. If this is true, then it has
> >the potential to royally mess up the network.
> 
> - no, you are confusing DataNotFound behavior with QueryReject
> - yes, what is happening is screwing the network royally
> 
> when request traffic begins to route properly, most nodes should
> see a fraction of their 'normal' queries per hour. This is
> because the number of retries at a given node MULTIPLIES the
> rate of those queries. So if I saw 30,000 queries/hour last
> month, and we squoosh this QR bugaboo, I should see something like
> 5000 to 15000 queries/hour as a result. In fact I saw that
> yesterday, with literally NO rejections for several hours.
> 
> 
> >z--a--b--c--d
> >      |  |
> >      f  e
> >
> >Node d is queryrejecting. All of A's requests will first travel through b
> >and c, be rejected by d, and travel through c and b back to a. Because a
> 
> no, when d rejects, c will next try e . At each step, nodes will by
> default retry up to _40_ of their peers before they give up and send
> back RouteNotFound.
> 
> In the worst possible case, a query would visit 800 nodes. Something
> around 100 is probably more likely. If things were working as designed,
> less than 30 nodes would be used.

Okay, you've convinced me. We need to get rid of QRs altogether as much
as possible, and let NGRouting do higher level load balancing over the
resulting long-term individual node overload. Right?
> 
> >The important thing to notice here is that passing the query causes load
> >in b and c, bringing them closer to queryrejecting themselves. This,
> >combined with the the above, means that the more queryrejecting nodes
> >there is, the more likely any given node is to queryreject. The end result
> >is a cascade failure: when the load rises high enough, some nodes start to
> >queryreject, which will cause more load to other nodes (localized mostly
> >near them in network topology; since b returned queryreject, a will next
> >try z), which causes some other nodes to fail, and so on.
> 
> this is generally the problem so far as we can tell...
> 
> >Imagine, if you will, a sweeping wave of queryrejection going through the
> >entire network. Each node going to queryrejection mode will increse the
> 
> unfortunately, we really don't need to imagine it :(
> 
> >One possible solution would be more tenious intermediate nodes. When d
> >queryrejects, c should try e next, and only after exhausting its
> 
> this is how it is working today.
> 
> >possibilities should it return QR. Similarly, b should try f before giving
> >up. Please note, however, that we probably want to limit this "elasticity"
> >to trying two or three nodes before giving up; otherwise there's a danger
> >that the query goes through the entire routing table and ends up being
> >routed to the worst possible node, instead of second or third best.
> 
> this is controlled by the maxRoutingSteps setting in config file, and I
> agree that a number much lower than 40 is appropriate.

A number much lower than 40 will just result in the network not working
at all. Whereas now... it doesn't work much.
> 
> >The other thing to do is to include a "time to live" to queryrejects; when
> >c gets the queryrejection from d, it should now know not to bother d again
> >for n seconds, freeing d from useless load. However, since c itself is not
> >overloaded, it should not pass any time to live if/when it returns the
> >queryreject to b.
> >
> >Also, the appointment method discussed some time ago would probably be a
> >good idea.
> 
> I think there is a punishment that needs to be imposed on a node if
> they don't back off. The punishment would be to DROP their requests
> for a while. Give them the first QR explicitly, but then just ignore
> the rest for a while. And we could use a QR msg to specify the rest of
> the command ("holdup for 3 seconds," "SLOW WAY DOWN,MAN")  We believe
> that not responding to requests would be handled properly by the
> requestor, with timeouts on that end.
> 
> >Anyway, the key is to stop routing to queryrejecting nodes *immediately*
> >and *completely* for a time specified by them, to allow the network to
> 
> not necessarily "completely" but definitely at a reduced rate.
> 
> >Comments ?
> 
> Just glad to see someone else describing what I believe was/is a major
> problem, and fortunately it IS being addressed by developers.
> 
> ken

-- 
Matthew J Toseland - [EMAIL PROTECTED]
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.

Attachment: signature.asc
Description: Digital signature

_______________________________________________
Devl mailing list
[EMAIL PROTECTED]
http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to