Bergquist, Brett wrote:
Derby is giving me fits but I just found a reproducible pattern. The
problem that I am having is that queries that are totally index based
without any changing of the query plan are becoming slow. Normally the
query takes about 6 milliseconds and it quickly degrades to 20 seconds
or more.
Here is some background. The system is quite busy with about 40%
system utilization for Derby. Heap used space is around 2Gb out of 4Gb
available. Java is Oracle Java 1.6 and using the UseParallelGC
parameters for garbage collection.
There are thousands of statements being executed and most of these are
new statements so few prepared statements are probably being used.
Is it possible to change this, are all the statements really different,
or could they be rewritten to be parameterized. Even if individual
connections don't do:
prepare
execute prepared with params
and instead just do
execute statement with params
derby has a cache of statements and you could possibly eliminate a lot
of the class garbage.
Using jvisualvm, I can see new classes being created and loaded every
second.
JvisualVM is showing a GC Activity of about 1.5% so the system is not
being hammered by the garbage collector. Also the “old” memory is
glowing really slowly.
So what I found is that if I hit the “Perform GC” button on JvisualVM to
perform an explicit garbage collection, the same query that was taking
20 seconds or more instantaneously comes down to taking about 6
milliseconds. Its processing time then starts to creep up back towards
20 seconds or more.
Is it possible that the jvm's ability to present memory to derby's
requests just slows down as more garbage is put in the pool? Garbage
has always been a performance problem for derby, so a lot of code it has
is written to not create garbage if possible. So the caches of pages
reuse the arrays for the pages, a scan of a million rows uses a small
number of container rows that are reused. I guess the real question
is if the jvm can start taking on the order of seconds to return a
memory request, and then if that happens I could come up with many
places in derby code where we might be holding latches that might stall
every one and maybe those stalls might be busy waits like:
lock manager critical section
buffer manager critical section
log system
Note that the system utilization does not decrease immediately after
performing the forced garbage collection. It is still just as busy at
about 40% cpu utilization.
without understanding how many threads are active and their makeup, like
i have said
40% is sort of meaningless. derby's ability to use up a cpu is pretty
much 1 per active thread in the system.
Just on a whim, right now, I put “CALL
SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE();” just in front of my query
(through SquirrelSQL) and executed both at the same time so that I could
eliminate the page cache as being a performance helper. Still the same
pattern. After the system runs for a few minutes at this really
busy, many thousands of statements, trying the query takes tens of
seconds. Force a garbage collection and the query immediately takes
milliseconds and then starts to degrade.
note checkpoint just writes dirty pages from cache to disk but does not
change the makeup of the cache. I guess it might make it quicker to get
a new page in the cache if you have filled the cache.
Another point, I also found that even when the parallel garbage
collection performs a major collection on its own (I watch this happen
with jvisualvm), that the query immediately becomes responsive right
after this.
So I am wondering if there is some sort of cache in Derby that is
getting really big and slow to find something in that gets cleanup when
an major garbage collection happens.
Any thoughts on this?
Any pointers to code to check or something to try in the code will be
most welcome. I am comfortable with building Derby from source….
I don't think derby uses any fancy java stuff in it's cache's that would
interact with garbage collection like you are looking for, but stuff may
have changed since I have looked at it. The interesting caches I would
look at would be:
page buffer cache
query plan cache
open file cache
Sounds like you have it down to
pretty reproducible in your environment. Some suggestions:
1) determine if the "slow" query execution is using the same query plan
as the "fast" one. Maybe you have already done this as part of your
previous problem.
2) does your app use blobs or clobs. I do know that in some situations
we need to keep locks on these when they are returned to users and
those locks are cleared on gc if the application does not explicitly
release them. your wait time of 20 seconds sounds a lot like the time
for a lock timeout. does your application get lock timeouts and maybe
automatically retry? Maybe as a test you could set
2) once you eliminate query plan then that kind of difference obviously
means your thread is waiting on something. Now you need to figure
out what. The full on answer might be to run a performance analyser
on the system and compare when it runs fast and slow. If you can
really force the 20 second wait, maybe you can do the poor mans
version of this and just cause the system to dump all the stacks of all
the threads every second or so and see if you can catch the wait. The
trick here is to be able to identify the thread you are interested in,
which is much harder in network server vs embedded mode. What I would
be inter
Brett