On Thursday 31 January 2008 19:29, Robert Hailey wrote: > > On Jan 31, 2008, at 12:00 PM, Matthew Toseland wrote: > > > On Thursday 31 January 2008 17:34, you wrote: > >> > >> On Jan 31, 2008, at 8:41 AM, Matthew Toseland wrote: > >> > >>> We are still getting timeouts. [...] > >>> Any theories about the most likely cause? > >> > >> Considering the rather common occurrence of high-ping opennet > >> peernodes, > > > > Oh? > > Every time I look at my opennet peers, I *always* have at least two > with pings greater than 2 seconds. Right now, one with 4.5 secs, and > one with 8.9 (the rest are sane).
Hmmm. Doesn't happen for me, although I only have 4 or 5 opennet peers. It seems extraordinarily unlikely that this is real - either this is a stats bug, or a message layer bug. > > >> my first suspect is that they are culminated pingtimes and > >> coalescing delays. > > > > Is that possible? 2 minute timeout, say 30 hops, it'd have to be ~ 4 > > second > > round trip per hop, which doesn't happen much does it? > > I don't know, I have seen ping times to peers in excess of 12 seconds. > How common are the timeouts anyway? Fairly common here - many per 5 minute log period, probably a few per minute. > > >> If this is the case, the only way I am aware of to > >> solve it is to favor nodes with low ping times; > > > > Opennet favours nodes that get successful requests. I want to keep > > alchemy out > > of it as much as possible, since this is what Oskar has shown to > > work, and > > anyway it ought to effectively balance all the other factors - if a > > node is > > too slow it won't generate successful requests. > > > >> I actually already > >> have a patch for that, although in it's present incantation it also > >> favors darknet nodes for routing (easily excised). > > > > :) > >> > >> My only other suspect is a bug in the message/link layer that drops > >> messages. > > > > Entirely possible. > > > > The current link layer sucks, it is in need of a major rewrite. > > There is one > > major bug which needs to be fixed (we increase the AIMD transfer > > rate to a > > peer indefinitely while we are not maxing it out, and then get a > > huge spike > > causing big problems when we do get more traffic to send). But more > > generally, it's not as close to TCP as I'd like it to be, and it has > > severe > > limits on packets in flight. The packet format is also much more > > verbose than > > it needs to be. > > > > http://wiki.freenetproject.org/NewTransportLayer > > http://wiki.freenetproject.org/NewPacketFormat > > > > Any such rewrite is highly unlikely to go in before 0.7.0. But if it > > turns out > > to be relatively urgent it should be done soon after that. > >> > >> In the past while examining the throttle controls, I have suspected > >> that (with priority queues) the "90-seconds at full throttle" > >> constant > >> might actually reduce to taking on too many concurrent chk transfers > >> for them all to complete on time. > > > > Why? IIRC we include a fudge factor in that calculation, admittedly > > it isn't > > very accurate and should be made more so by using stats on bandwidth > > usage... > > Just that the CHKs all use the same throttle, so they all throttle- > down when we accept another CHK transfer. Well sure, but if the mechanism is working we won't accept enough to be a problem. > > >>> Do timeouts show up in simulation? > >> > >> I don't normally watch for them, I've started a new run with Accepted > >> & Fatal request timeouts being logged. So far nothing. > > > > Ok. > > After running the simulator for two hours w/ ten nodes, I spot exactly > one Accepted timeout (17 minutes into the simulation). > > So the answer is yes... timeouts still occur in the simulator. Suggests a messaging bug, although it's possible it's an artifact of java's lack of thread priorities on *nix (i.e. cpu issues). > > >>> What can we do to debug this? > >> > >> Probably: > >> (1) a simulated high-ping times seen in the public network at about > >> the same rate, > > > > You mean bugs cause high ping times and high ping times cause > > timeouts? > > > >> (2) a message/link layer stress test complete with rekeying/ > >> disconnects/and [busy/not-busy] spikes > > > > This would be a good idea, I dunno how much work would be involved? > > > > What can I usefully work on in this area? AFAICS: > > - The window-grows-while-unused bug. > > - More accurate bandwidth liability limiting. > > - Debug the not-forwarded detection and make assumeNATed false by > > default. > > (Reduce baseload bandwidth usage). > > > > Anything else? You want to take any of these on? > > I don't think I can take on a big project right now. Is there anything I can do? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/devl/attachments/20080131/b392c6b1/attachment.pgp>