[freenet-dev] Still getting timeouts

Matthew Toseland Thu, 31 Jan 2008 19:43:39 +0000

On Thursday 31 January 2008 19:29, Robert Hailey wrote:
> 
> On Jan 31, 2008, at 12:00 PM, Matthew Toseland wrote:
> 
> > On Thursday 31 January 2008 17:34, you wrote:
> >>
> >> On Jan 31, 2008, at 8:41 AM, Matthew Toseland wrote:
> >>
> >>> We are still getting timeouts. [...]
> >>> Any theories about the most likely cause?
> >>
> >> Considering the rather common occurrence of high-ping opennet
> >> peernodes,
> >
> > Oh?
> 
> Every time I look at my opennet peers, I *always* have at least two  
> with pings greater than 2 seconds. Right now, one with 4.5 secs, and  
> one with 8.9 (the rest are sane).


Hmmm. Doesn't happen for me, although I only have 4 or 5 opennet peers.

It seems extraordinarily unlikely that this is real - either this is a stats 
bug, or a message layer bug.
> 
> >> my first suspect is that they are culminated pingtimes and
> >> coalescing delays.
> >
> > Is that possible? 2 minute timeout, say 30 hops, it'd have to be ~ 4  
> > second
> > round trip per hop, which doesn't happen much does it?
> 
> I don't know, I have seen ping times to peers in excess of 12 seconds.  
> How common are the timeouts anyway?

Fairly common here - many per 5 minute log period, probably a few per minute.
> 
> >> If this is the case, the only way I am aware of to
> >> solve it is to favor nodes with low ping times;
> >
> > Opennet favours nodes that get successful requests. I want to keep  
> > alchemy out
> > of it as much as possible, since this is what Oskar has shown to  
> > work, and
> > anyway it ought to effectively balance all the other factors - if a  
> > node is
> > too slow it won't generate successful requests.
> >
> >> I actually already
> >> have a patch for that, although in it's present incantation it also
> >> favors darknet nodes for routing (easily excised).
> >
> > :)
> >>
> >> My only other suspect is a bug in the message/link layer that drops
> >> messages.
> >
> > Entirely possible.
> >
> > The current link layer sucks, it is in need of a major rewrite.  
> > There is one
> > major bug which needs to be fixed (we increase the AIMD transfer  
> > rate to a
> > peer indefinitely while we are not maxing it out, and then get a  
> > huge spike
> > causing big problems when we do get more traffic to send). But more
> > generally, it's not as close to TCP as I'd like it to be, and it has  
> > severe
> > limits on packets in flight. The packet format is also much more  
> > verbose than
> > it needs to be.
> >
> > http://wiki.freenetproject.org/NewTransportLayer
> > http://wiki.freenetproject.org/NewPacketFormat
> >
> > Any such rewrite is highly unlikely to go in before 0.7.0. But if it  
> > turns out
> > to be relatively urgent it should be done soon after that.
> >>
> >> In the past while examining the throttle controls, I have suspected
> >> that (with priority queues) the "90-seconds at full throttle"  
> >> constant
> >> might actually reduce to taking on too many concurrent chk transfers
> >> for them all to complete on time.
> >
> > Why? IIRC we include a fudge factor in that calculation, admittedly  
> > it isn't
> > very accurate and should be made more so by using stats on bandwidth  
> > usage...
> 
> Just that the CHKs all use the same throttle, so they all throttle- 
> down when we accept another CHK transfer.

Well sure, but if the mechanism is working we won't accept enough to be a 
problem.
> 
> >>> Do timeouts show up in simulation?
> >>
> >> I don't normally watch for them, I've started a new run with Accepted
> >> & Fatal request timeouts being logged. So far nothing.
> >
> > Ok.
> 
> After running the simulator for two hours w/ ten nodes, I spot exactly  
> one Accepted timeout (17 minutes into the simulation).
> 
> So the answer is yes... timeouts still occur in the simulator.

Suggests a messaging bug, although it's possible it's an artifact of java's 
lack of thread priorities on *nix (i.e. cpu issues).
> 
> >>> What can we do to debug this?
> >>
> >> Probably:
> >> (1) a simulated high-ping times seen in the public network at about
> >> the same rate,
> >
> > You mean bugs cause high ping times and high ping times cause  
> > timeouts?
> >
> >> (2) a message/link layer stress test complete with rekeying/
> >> disconnects/and [busy/not-busy] spikes
> >
> > This would be a good idea, I dunno how much work would be involved?
> >
> > What can I usefully work on in this area? AFAICS:
> > - The window-grows-while-unused bug.
> > - More accurate bandwidth liability limiting.
> > - Debug the not-forwarded detection and make assumeNATed false by  
> > default.
> > (Reduce baseload bandwidth usage).
> >
> > Anything else? You want to take any of these on?
> 
> I don't think I can take on a big project right now.

Is there anything I can do?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20080131/b392c6b1/attachment.pgp>

[freenet-dev] Still getting timeouts

Reply via email to