I'd been deleting the logs, so I don't have one right now ><. I did
change my scripts to save them though. As soon as it happens again I'll
have some data. It seems to take about a week or so of running from a
fresh start before I start to get problems.
Niall: thanks for the explanation. I figured they were probably Byte
arrays, but then I saw the Strings and that threw me off :).
Anyway as soon as I get some real data I'll post it to the list.
Thanks all!
-Josh
Aaron Smuts wrote:
Do you have any of the cache logs when this is
happening?
I would turn the memory shrinker off (set the property
to false), as a start. I generally don't run with the
memory shrinker on. But I'm shooting in the dark.
Aaron
--- Joshua Szmajda <[EMAIL PROTECTED]> wrote:
Ahh yes of course, it was the user requirement. Now
I have a nice bunch
of data. This is interesting, but I'm not sure what
the [B class is:
num #instances #bytes class name
--------------------------------------
1: 31419 284852480 [B
2: 2277 19760264 [I
3: 57834 3865240 [C
4: 29628 1896192
org.apache.jcs.engine.ElementAttributes
5: 57838 1388112 java.lang.String
...
Niall Gallagher wrote:
Hmm :D
I just did a bit of digging. I've used this script
on a few of our
servers in the past (32 and 64bit server VMs), but
I just found a server
which gave me the exact same error message you
got. That server it turns
out runs Java under a different user account to
the one I was logged
into however.
Try running the script from the exact same user
account the JVM process
is running from. Even running from root doesn't
work didn't work for me
on that server, it had to be exact same user
account, which is
surprising.
By the way those tools are documented here:
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html
and
http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstack.html
-basically they're supposed to work on most
platforms except Windows and
Linux Itanium so unless you've got Itanium cpus it
should work for you.
On Wed, 2008-04-09 at 14:44 -0400, Joshua Szmajda
wrote:
Hey Niall,
Thanks for your script, but I'm getting these
errors:
./capture-diagnostics.sh RemoteCacheServerFactory
Capturing diagnostics for Java process
"RemoteCacheServerFactory" (pid
2007)...
2007: Unable to open socket file: target process
not responding or
HotSpot VM not loaded
The -F option can be used when the target process
is not responding
2007: Unable to open socket file: target process
not responding or
HotSpot VM not loaded
The -F option can be used when the target process
is not responding
Saved diagnostics for "RemoteCacheServerFactory"
to
"RemoteCacheServerFactory-diagnostics.txt"
There must be something I'm missing when I'm
running the cache server. I
noticed it uses the 'server' VM by default, maybe
these debug commands
are only good for the client VM?
Thanks!
-Josh
Niall Gallagher wrote:
Hi Josh,
Can you modify your cron job to capture
diagnostics before it restarts
the cache server?
Then you can post the diagnostics next time it
happens. The script below
will capture diagnostics for you. We use
something like this in-house
for troubleshooting (not specifically for JCS).
You'll first have to run the JDK 'jps' command
from either root, or the
user account which runs your cache server
instance. This gives you the
"name" of your cache server JVM process, which
you need to supply to the
diagnostics script as command-line parameter.
The script uses the name
to attach to the relevant JVM process.
I don't know what might be causing the problem
for you. It could be a
bug in JCS, or it could be a memory issue. The
diagnostics will help
identify the problem.
Save this as "capture-diagnostics.sh"...
-------
#!/bin/sh
# Saves the stack traces and class memory usage
information for a
# Java process running on the machine to a
diagnostics file.
#
# This script expects the name of the relevant
Java process to be
# specified as a parameter. The name specified
should match a Java
# process name as listed by running the JDK
'jps' command.
#
# Usage: sh capture-diagnostics.sh <name of
process>
APP_NAME="$1"
JDK_LOCATION="/usr/java/default"
DUMP_FILE="$APP_NAME-diagnostics.txt"
APP_PID="`$JDK_LOCATION/bin/jps|grep $APP_NAME
2> /dev/null|cut -d\
-f1`"
if [ "$APP_PID" = "" ]; then
echo "ERROR: Can't determine pid of Java process
name specified
\"$APP_NAME\""
echo "Usage: sh capture-diagnostics.sh <name of
process as listed by jps
command>"
exit 20
fi
echo "Capturing diagnostics for Java process
\"$APP_NAME\" (pid
$APP_PID)..."
echo -e "Diagnostics for Java process
\"$APP_NAME\" (pid $APP_PID) as at
`date`:-" >> $DUMP_FILE
echo -e "\nTop 30 memory-consuming classes:-" >>
$DUMP_FILE
$JDK_LOCATION/bin/jmap -histo:live $APP_PID
|head -n33 >> $DUMP_FILE
echo -e "\nThread stack traces:-" >> $DUMP_FILE
$JDK_LOCATION/bin/jstack $APP_PID >> $DUMP_FILE
echo -e "\n" >> $DUMP_FILE
echo "Saved diagnostics for \"$APP_NAME\" to
\"$DUMP_FILE\""
-------
On Wed, 2008-04-09 at 10:11 -0400, Joshua
Szmajda wrote:
Hey all,
I've got a JCS remote cache server running on a
machine and every now
and then it will spiral out of control and lock
the machine. I have no
idea yet what's causing this, I've just put
some extra measures in place
to capture the logs from when it happens. My
solution at this point is a
cron job that checks now and then for excessive
cpu usage and restarts
the cache server. I'd like to be able to not
worry about it, though :).
Any suggestions?
Thanks!
-Josh
P.S. it's running on ubuntu-server (kernel
2.6.22-14-server).
I have up to 16 remote listeners connecting to
any given region.
(probably 20 application instances in all).
Puts grow at a rate of about 400 per second.
I pass these options to java: "-Xms128m
-Xmx2000m"
And here's my simple remote.cache.ccf:
=== message truncated ===
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]