I have been having strange stability problems for a while. I used to run on 
RH7.0 with a home brewed heavily patched 2.2.25 kernel and Sun's JVM 
1.4.1-02. The node was not very stable. Sometimes it would stop talking to 
everything after a few minutes, sometimes a few hours. It never remained 
operational for more than a day or so without a restart. The problem never 
really went away. I figured it was something about the setup I was running, 
probably a wierd kernel issue. A typical symptom was that routing time went 
to 0ms and the note started rejecting all incoming connections.

So, I eventually upgraded to RH9.0 (running stock RH kernel at the moment) and 
Sun's JVM 1.4.2-b28. Now things are far, far worse. The typical uptime is 
somewhere in the region of 20 minutes before the note dies and has to be 
restarted. There are two things that typicaly happen. One is an outright 
crash, with the error output file generated as in the attached file. Same 
every time (different PID, obviously).

When it is not an outright crash, the note simply stops responding. It's CPU 
usage goes to nearly 0% (typically consuming all cycles available to it, 
running at nice -n 15). Nothing happens until I kill the Java process and 
re-start it.

It has gotten so bad that I have a cron job re-starting the node once per 
hour. I also noticed that there are very few good/green nodes in my routing 
tables. There are a few red nodes with only a handful of consecutive 
failures, but most are not even contacted. Having loads of incoming 
connections, though. I'm not sure what that implies. It could be a 
side-effect if the ridiculous note instability I am expriencing, so the 
routing table doesn't get properly exercised before the node crashes.

This has been a problem since before v590 or so.

Is there any hope of a stability improvement on the horizon? The current level 
of stability is not really all that useful...

Regards.

Gordan

On Wednesday 23 July 2003 18:02, Toad wrote:
> Confirmed that at least tessierHK's node lockups, on build 5015 running
> on debian unstable standard Linux kernel 2.4.21. No NPTL, unless NPTL
> was merged into 2.4.21 (which I seriously doubt), and using the Sun JDK
> 1.4.1_02. Now trying 1.4.1_03.
>
> On Wed, Jul 23, 2003 at 02:35:09PM +0100, Toad wrote:
> > Two users on IIP, and many previous items - I want a headcount: who has
> > had the node freeze on them, while running Sun 1.4.1, on a non-NPTL
> > kernel? If it's more than 10, we should consider fixing Kaffe/GCJ,
> > because it's a major problem if nodes are not stable. We've been seeing
> > those BlockingQueue errors (indicating a synchronization problem) for
> > some time now... Interestingly, Reskill doesn't seem to be running NPTL
> > (he's on RH7.3)... TrueSeeker is running java 1.4.1, which should revert
> > to legacy threads...
> >
> > So is it a case of sticking our head in the sand again and telling
> > everyone to use Windows, and hoping they don't run into the
> > runaway-win-bug too?
An unexpected exception has been detected in native code outside the VM.
Unexpected Signal : 11 occurred at PC=0x40032F6F
Function=pthread_kill+0xF
Library=/lib/tls/libpthread.so.0

Current Java thread:
	at sun.nio.ch.NativeThread.signal(Native Method)
	at sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:630)
	- locked <0x4ebdac20> (a java.lang.Object)
	at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:202)
	at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
	- locked <0x4ebdabf8> (a java.lang.Object)
	at freenet.transport.tcpConnection.close(tcpConnection.java:421)
	- locked <0x4ebdab88> (a java.lang.Object)
	at freenet.transport.AbstractSelectorLoop$CloseThread.run(AbstractSelectorLoop.java:713)

Dynamic libraries:
08048000-0804e000 r-xp 00000000 09:01 28082191   /usr/java/j2sdk1.4.2/bin/java
0804e000-0804f000 rw-p 00005000 09:01 28082191   /usr/java/j2sdk1.4.2/bin/java
40000000-40015000 r-xp 00000000 09:01 12828751   /lib/ld-2.3.2.so
40015000-40016000 rw-p 00014000 09:01 12828751   /lib/ld-2.3.2.so
40017000-4001f000 r-xp 00000000 09:01 70467597   /usr/java/j2sdk1.4.2/jre/lib/i386/native_threads/libhpi.so
4001f000-40020000 rw-p 00007000 09:01 70467597   /usr/java/j2sdk1.4.2/jre/lib/i386/native_threads/libhpi.so
40020000-4002b000 r-xp 00000000 09:01 12828694   /lib/libnss_files-2.3.2.so
4002b000-4002c000 rw-p 0000a000 09:01 12828694   /lib/libnss_files-2.3.2.so
4002c000-40037000 r-xp 00000000 09:01 13991941   /lib/tls/libpthread-0.34.so
40037000-40038000 rw-p 0000a000 09:01 13991941   /lib/tls/libpthread-0.34.so
4003a000-4003d000 r-xp 00000000 09:01 12828684   /lib/libdl-2.3.2.so
4003d000-4003e000 rw-p 00002000 09:01 12828684   /lib/libdl-2.3.2.so
4003f000-405eb000 r-xp 00000000 09:01 72564752   /usr/java/j2sdk1.4.2/jre/lib/i386/server/libjvm.so
405eb000-40645000 rw-p 005ab000 09:01 72564752   /usr/java/j2sdk1.4.2/jre/lib/i386/server/libjvm.so
40658000-4066a000 r-xp 00000000 09:01 12828688   /lib/libnsl-2.3.2.so
4066a000-4066b000 rw-p 00011000 09:01 12828688   /lib/libnsl-2.3.2.so
4066d000-4068e000 r-xp 00000000 09:01 13991939   /lib/tls/libm-2.3.2.so
4068e000-4068f000 rw-p 00020000 09:01 13991939   /lib/tls/libm-2.3.2.so
4068f000-40693000 rw-s 00000000 09:01 25575431   /tmp/hsperfdata_freenet/4510
40693000-406a3000 r-xp 00000000 09:01 69206053   /usr/java/j2sdk1.4.2/jre/lib/i386/libverify.so
406a3000-406a5000 rw-p 0000f000 09:01 69206053   /usr/java/j2sdk1.4.2/jre/lib/i386/libverify.so
406a5000-406c5000 r-xp 00000000 09:01 69206039   /usr/java/j2sdk1.4.2/jre/lib/i386/libjava.so
406c5000-406c7000 rw-p 0001f000 09:01 69206039   /usr/java/j2sdk1.4.2/jre/lib/i386/libjava.so
406c7000-406db000 r-xp 00000000 09:01 69206054   /usr/java/j2sdk1.4.2/jre/lib/i386/libzip.so
406db000-406de000 rw-p 00013000 09:01 69206054   /usr/java/j2sdk1.4.2/jre/lib/i386/libzip.so
40728000-4073e000 r--s 00000000 09:01 45711402   /usr/java/j2sdk1.4.2/jre/lib/sunrsasign.jar
4073e000-40819000 r--s 00000000 09:01 45711401   /usr/java/j2sdk1.4.2/jre/lib/jsse.jar
40819000-4082a000 r--s 00000000 09:01 45711393   /usr/java/j2sdk1.4.2/jre/lib/jce.jar
4082a000-40d83000 r--s 00000000 09:01 45711394   /usr/java/j2sdk1.4.2/jre/lib/charsets.jar
4195a000-4195d000 r--s 00000000 09:01 61849635   /usr/java/j2sdk1.4.2/jre/lib/ext/dnsns.jar
4195d000-4196a000 r--s 00000000 09:01 61849636   /usr/java/j2sdk1.4.2/jre/lib/ext/ldapsec.jar
4196a000-41a26000 r--s 00000000 09:01 61849639   /usr/java/j2sdk1.4.2/jre/lib/ext/localedata.jar
41a26000-41a42000 r--s 00000000 09:01 61849638   /usr/java/j2sdk1.4.2/jre/lib/ext/sunjce_provider.jar
41a42000-41c3b000 r--s 00000000 09:01 37732388   /var/lib/freenet/freenet.jar
41c3b000-41c4b000 r-xp 00000000 09:01 69206050   /usr/java/j2sdk1.4.2/jre/lib/i386/libnet.so
41c4b000-41c4c000 rw-p 0000f000 09:01 69206050   /usr/java/j2sdk1.4.2/jre/lib/i386/libnet.so
41c4c000-41c52000 r-xp 00000000 09:01 69206051   /usr/java/j2sdk1.4.2/jre/lib/i386/libnio.so
41c52000-41c53000 rw-p 00005000 09:01 69206051   /usr/java/j2sdk1.4.2/jre/lib/i386/libnio.so
41c61000-41c64000 r-xp 00000000 09:01 12828692   /lib/libnss_dns-2.3.2.so
41c64000-41c65000 rw-p 00003000 09:01 12828692   /lib/libnss_dns-2.3.2.so
41c65000-41c74000 r-xp 00000000 09:01 12828704   /lib/libresolv-2.3.2.so
41c74000-41c75000 rw-p 0000f000 09:01 12828704   /lib/libresolv-2.3.2.so
41c77000-41c97000 r--s 00000000 09:01 37732360   /var/lib/freenet/freenet-ext.jar
42000000-4212f000 r-xp 00000000 09:01 13991946   /lib/tls/libc-2.3.2.so
4212f000-42132000 rw-p 0012f000 09:01 13991946   /lib/tls/libc-2.3.2.so
42134000-43abf000 r--s 00000000 09:01 45711403   /usr/java/j2sdk1.4.2/jre/lib/rt.jar

Heap at VM Abort:
Heap
 def new generation   total 24192K, used 5624K [0x45ac0000, 0x474f0000, 0x493a0000)
  eden space 21568K,  26% used [0x45ac0000, 0x4603e248, 0x46fd0000)
  from space 2624K,   0% used [0x46fd0000, 0x46fd0000, 0x47260000)
  to   space 2624K,   0% used [0x47260000, 0x47260000, 0x474f0000)
 tenured generation   total 195108K, used 115983K [0x493a0000, 0x55229000, 0x65ac0000)
   the space 195108K,  59% used [0x493a0000, 0x504e3d48, 0x504e3e00, 0x55229000)
 compacting perm gen  total 16384K, used 5652K [0x65ac0000, 0x66ac0000, 0x69ac0000)
   the space 16384K,  34% used [0x65ac0000, 0x66045368, 0x66045400, 0x66ac0000)

Local Time = Mon Jul 21 13:51:05 2003
Elapsed Time = 3058
#
# The exception above was detected in native code outside the VM
#
# Java VM: Java HotSpot(TM) Server VM (1.4.2-b28 mixed mode)
#

Reply via email to