On Jan 8, 2008, at 3:33 PM, Matthew Toseland wrote:
On Saturday 05 January 2008 00:50, Robert Hailey wrote:
Interestingly (now that I have got the simulator running), this
'general timeout' appears even in simulations between nodes on the
same machine. Unless I coded something wrong, perhaps there is an
added delay or missing response somewhere which is not obvious?
Entirely possible. Fixing it would be better than an arbitrary
cutoff when we
are still able to potentially find the data, and still have enough
HTL to do
so.
On Jan 8, 2008, at 2:27 PM, Matthew Toseland wrote:
On Friday 04 January 2008 18:32, Robert Hailey wrote:
Apparently until this revision 16886, (so long as any one node does
not timeout) a node will take as long as necessary to exhaust
routable
peers. Even long after the original requestor has given up on that
node.
Is there any evidence that this happens in practice? Surely the HTL
should
prevent excessive searching in most cases?
There is, in fact. The timeout itself (which I have been running on my
node for a while) is evidence of the behavior (which to be seems
incorrect).
Jan 08, 2008 20:03:41:146 (freenet.node.RequestSender, RequestSender
for UID -3998139406700477577, ERROR): discontinuing non-local request
search, general timeout (6 attempts, 3 overloads)
...
Jan 08, 2008 20:12:21:226 (freenet.node.RequestSender, RequestSender
for UID 60170596711015291, ERROR): discontinuing non-local request
search, general timeout (1 attempts, 3 overloads)
You see... in the first log statement the node tried six peers before
running out of time. In the second case (which occurs quite
frequently), the node used the entire 2 minutes waiting on a response
from one node (FETCH_TIMEOUT); if it were allowed to continue to the
next node it could (65%) spend another 2 minutes on just-that-node.
To the best of my knowledge, all of the upstream nodes will not
respond with the LOOP rejection before then. And even well before the
worst case, this effect can accrue across many nodes in the path.
If the same request is routed to a node which is already running it,
it
rejects it with RejectedLoop. If it's routed to a node which has
recently ran
it, it again rejects it. If it is a different request for the same
key, it
may be coalesced.
If you mean that the RECENTLY_FAILED mechanism would keep this in
check... I see this idea in many places, but I cannot see where this
is actually implemented. The only place I see that makes a
FNPRecentlyFailed message is in RequestHandler (upon it's
RequestSender having received one).
Presently the node will "RejectLoop" if it is one of the last 10000
completed requests. My node runs through that many requests in about
16 minutes. With this logging statement it is already shown that a
request can last longer than 2 minutes for one peer (and most nodes
have 20). If you assume that a request takes 4 minutes (two peers,
VERY optimistic), then it would then only take 4 nodes ('near' each
other) to generate a request-live-lock (absent the HTL; the request
would never drop from the network); each trying two of it's other
peers, and then the next in the 4-chain.
I do not think that this timeout I added is arbitrary. As I understand
Ian's original networking theory, a request is not valid after the
originator has timed out. In much the same way that a single node
fatally timing out collapses the request chain, so too should a node
'taking too long' (as that node *IS* the one fatally timing out the
chain).
But on the other hand, I do understand your point about the HTL, and
that it would keep the request from continuing indefinitely; it seems
like it could be quite a waste of network resources. Certainly beyond
that point in time (where the requester has fatally timed out) no
response should be sent back to the source (that could be many of the
unclaimed fifo packets); or maybe just if the data is finally found
(you mentioned ULPRs).
--
Robert Hailey
_______________________________________________
Devl mailing list
[email protected]
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl