Another 3 stack traces here, of a different lost lock (still around
PacketSender).

http://amphibian.dyndns.org/argh.2.txt

The obvious solution would seem to be - and has been in the past -
LD_LIBRARY_PATH. Unfortunately there are systems on which this causes a
crash by itself e.g. some gentoo's, and nextgens tells me that some
users seem to get the same bug on Windows, although this is difficult to
confirm as they can't easily get a stack trace.

For me this is triggered by inserts.

It is known to happen on 1.4.2 and 1.5.0_06 (Sun *and* Blackdown).

What we DO need to know is if it happens on Windows. Anyone who can get
a stack dump on a Windows node, watch out for all nodes getting backed
off due to Timeout3 or AcceptedTimeout (the same reason on all or most
nodes), and get some stack dumps. Our past experience is that this is an
NPTL issue and therefore linux-specific.

IBM isn't tested yet. GCJ/GIJ should be immune, and nextgens is working
on that.

On Wed, May 24, 2006 at 11:27:30PM +0100, Matthew Toseland wrote:
> Observe the two stack traces here:
> http://amphibian.dyndns.org/argh.txt
> 
> Look at PacketSender in both cases. There were some seconds between
> them, but they're both the same. It has locked one lock, and it is
> waiting for the other. The other lock is not held by any thread.
> 
> This is accompanied by wierd symptoms: Every node is backed off because
> of an AcceptedTimeout.
> 
> In conclusion? The current 0.7 code triggers a JVM bug - at least on my
> machine - which kills us. I've seen the same thing with logging.
> 
> Any ideas for a way forward? Or any ideas for why I am wrong (I hope I
> am)? This is consistent, I just did another one, many minutes later. It
> always has:
> 
> "PacketSender thread for 0" daemon prio=1 tid=0x0825bbd8 nid=0x8c0
> waiting for monitor entry [0xb11ff000..0xb11ff5c0]
> at freenet.node.KeyTracker.getNextUrgentTime(KeyTracker.java:790)
> - waiting to lock <0x7ef4d718> (a
>   freenet.support.UpdatableSortedLinkedListWithForeignIndex)
> at freenet.node.PeerNode.getNextUrgentTime(PeerNode.java:641)
> - locked <0x7e129c78> (a freenet.node.PeerNode)
> at freenet.node.PacketSender.realRun(PacketSender.java:85)
> at freenet.node.PacketSender.run(PacketSender.java:47)
> at java.lang.Thread.run(Thread.java:595)
> 
> And in all 3 cases, (and with the same problem with logging earlier),
> 0x7ef4d718 is not locked by any thread.
> 
> And it's not looping; it's the same lock it's trying to get, and the
> same lock it's got already, in all 3 cases.
> 
> This is with sun java 1.5.0_06.
> -- 
> Matthew J Toseland - toad at amphibian.dyndns.org
> Freenet Project Official Codemonkey - http://freenetproject.org/
> ICTHUS - Nothing is impossible. Our Boss says so.



> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20060526/ca8bdbda/attachment.pgp>

Reply via email to