-----Original Message-----
From: Mike Matrigali [mailto:[email protected]]
Sent: Tuesday, March 27, 2012 8:33 PM
To: [email protected]
Subject: Re: Really need some help here please. I think I am having a problem 
with some sort of caching in Derby

Bergquist, Brett wrote:
> Derby is giving me fits but I just found a reproducible pattern.   The
> problem that I am having is that queries that are totally index based
> without any changing of the query plan are becoming slow.   Normally the
> query takes about 6 milliseconds and it quickly degrades to 20 seconds
> or more.
>
>
>
> Here is some  background.  The system is quite busy with about 40%
> system utilization for Derby.  Heap used space is around 2Gb out of
> 4Gb available.  Java is Oracle Java 1.6 and using the UseParallelGC
> parameters for garbage collection.
>
>
>
> There are thousands of statements being executed and most of these are
> new statements so few prepared statements are probably being used.
Is it possible to change this, are all the statements really different, or 
could they be rewritten to be parameterized.  Even if individual connections 
don't do:
prepare
execute prepared with params
and instead just do
execute statement with params

derby has a cache of statements and you could possibly eliminate a lot of the 
class garbage.

[bb] Unfortunately the queries are generated from Java Persistence Architecture 
(JPA) using EclipseLink.  So changing what that is generating at this time is 
not practical.

I did find one problem yesterday related to the statement chache in that the 
statement cache is very small by default (100 statements) and that the property 
to change this is in none of the documentation (manuals).  Zero.  I found this 
by going through the code.  There is a reference to disabling the statement 
cache in the Wiki but nothing else other than this bug:

https://issues.apache.org/jira/browse/DERBY-4280

So with many unique statements, this cache is really thrashing.  Overnight, I 
increased this cache to 10000 statements and my test setup was able to run 
continuously with negligible increased memory consumption.

Also, looking at the cache, I think it could be improved here.  One thought 
comes to mind is to cache things that are "expensive".  So take a simple query 
that uses one table returning a row that is using an indexed lookup.  This 
seems to compile very quickly.  As an example, the application server can use a 
JDBC Realm for authentication.  Each time the user is authenticated, the 
authentication time is logged.  Using a web server that authenticates and is 
hit very often, a statement such as "update j2ee_users set last_login_tme = 
current_time() where username = 'xxx'" is done very often.  The application 
server generates the code, so rewriting this to use a prepared statement is not 
practical.   Yet this statement can quickly trash the statement cache because 
the current_time() is going to be unique on each call.   This statement 
compiles very quickly but floods the cache flushing out other statements that 
may be more expensive to cache.

So my thinking might be to weight the replacement of an entry in the cache by 
the cost of the statement.  So don't replace a statement that compiles and 
optimizes with a high cost with a statement that compiles and optimizes with 
very little cost.  Maybe add in an age factor on this such that if a cached 
statement has not been hit for a long time it is replace.   So the replacement 
would take into the cost and age of a statement before replacing it.

Just thinking now, a quick and dirty might be to never cache a statement this 
is only going to use one table as this should be pretty quick to compile as the 
optimizer needs to chose a table scan or index scan based on the constraints 
and not much more.  So the example above of the "update j2ee_users" would not 
be cached and not flood the cache.

I also played with the "derby.language.statementCacheSize" and set it to 0 just 
to force a test case.  When I did this, a set of queries such as:

CALL SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE();
CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
SELECT * ....
VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0);

fails.  Rather the " VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();" fails.  
Probably this is looking at the cache to get the statistics and now there is no 
cached statement so no statistics.  So the idea above of not caching certain 
things would probably need some work.  I bet a good test case for this would be 
to set a statement cache size to 1 and execute the above sequence and probably 
this will fail right away as the "VALUES ..." would replace the "SELECT ..." in 
the cache and then not find the statistics for "SELECT ..."


> Using jvisualvm, I can see new classes being created and loaded every
> second.
>
>
>
> JvisualVM is showing a GC Activity of about 1.5% so the system is not
> being hammered by the garbage collector.   Also the "old" memory is
> glowing really slowly.
>
>
>
> So what I found is that if I hit the "Perform GC" button on JvisualVM
> to perform an explicit garbage collection, the same query that was
> taking
> 20 seconds or more instantaneously comes down to taking about 6
> milliseconds.  Its processing time then starts to creep up back
> towards
> 20 seconds or more.
Is it possible that the jvm's ability to present memory to derby's requests 
just slows down as more garbage is put in the pool?  Garbage has always been a 
performance problem for derby, so a lot of code it has is written to not create 
garbage if possible.  So the caches of pages reuse the arrays for the pages, a 
scan of a million rows uses a small number of container rows that are reused.  
I guess the real question is if the jvm can start taking on the order of 
seconds to return a memory request, and then if that happens I could come up 
with many places in derby code where we might be holding latches that might 
stall every one and maybe those stalls might be busy waits like:
lock manager critical section
buffer manager critical section
log system

[bb]
As a further test, I disabled all activity to the database so that there was no 
other statements being executed at all.  CPU utilization drops to near zero.   
The heap being used was about 2.5Gb out of 4G.  Note that the JVM was started 
with -Xmx4096 -Xms4096 so the heap is initialized at startup and never grows.  
Run the query and it took a long time.   Run the test all over again and this 
time causing significant utilization and then shut everything down, force a 
garbage collection, and then run the query and it performed in a couple of 
seconds.

Now this just might be a coincidence with the query being in the statement 
cache and maybe I hit is just right on each test where the statement was not in 
the cache when the query took a long time and the statement was in the cache 
when I did the garbage collection and the query returned.

>
>
>
> Note that the system utilization does not decrease immediately after
> performing the forced garbage collection.  It is still just as busy at
> about 40% cpu utilization.
without understanding how many threads are active and their makeup, like i have 
said 40% is sort of meaningless.  derby's ability to use up a cpu is pretty 
much 1 per active thread in the system.
>

[bb] The test system is an Oracle M3000 with 8 cores and 32Gb of memory.  So it 
was not CPU bound, I/O bound (especially when I shut everything down accessing 
the database), or memory bound.

>
>
> Just on a whim, right now, I put "CALL
> SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE();" just in front of my query
> (through SquirrelSQL) and executed both at the same time so that I could
> eliminate the page cache as being a performance helper.   Still the same
> pattern.      After the system runs for a few minutes at this really
> busy, many thousands of statements, trying the query takes tens of
> seconds.   Force a garbage collection and the query immediately takes
> milliseconds and then starts to degrade.
>
note checkpoint just writes dirty pages from cache to disk but does not change 
the makeup of the cache.  I guess it might make it quicker to get a new page in 
the cache if you have filled the cache.

[bb] So the Derby tuning guide should mention this.  Page 21 of this guide says 
to use " Create a benchmark test suite against which to run the sample queries. 
The first
thing the test suite should do is checkpoint data (force Derby to flush data to 
disk)."

It then goes on to say:

Use performance timings to identify poorly performing queries. Try to 
distinguish
between cached and uncached data. Focus on measuring operations on uncached
data (data not already in memory). For example, the first time you run a query,
Derby returns uncached data. If you run the same query immediately afterward,
Derby is probably returning cached data. The performance of these two otherwise
identical statements varies significantly and skews results.

So as a user, this really seemed to imply that the 
"SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE();" would help to do this.  Probably it 
would be better to supply a system procedure to flush the page cache or at 
least say to set the page cache size to something small (the documentation says 
that 40 pages is the minimum but I would need to look at the code).

>
>
> Another point, I also found that even when the parallel garbage
> collection performs a major collection on its own (I watch this happen
> with jvisualvm), that the query immediately becomes responsive right
> after this.
>
>
>
> So I am wondering if there is some sort of cache in Derby that is
> getting really big and slow to find something in that gets cleanup
> when an major garbage collection happens.
>
>
>
> Any thoughts on this?
>
>
>
> Any pointers to code to check or something to try in the code will be
> most welcome.  I am comfortable with building Derby from source....
I don't think derby uses any fancy java stuff in it's cache's that would 
interact with garbage collection like you are looking for, but stuff may have 
changed since I have looked at it.  The interesting caches I would look at 
would be:
page buffer cache
query plan cache
open file cache

[bb] I went through the code looking for WeakReference or WeakHashMap or 
SoftReference which would be things that could be forced by the GC to be freed. 
 I only found a few and these don't seem to be relevant as far as I can tell.   
There was only one that had a WeakHashMap that its keyset was being iterated 
and I did not see where the entries were being cleared.  I believe it was in 
connection class and may be related to XA processing (which we are doing a lot 
of).  I will look closer.

Sounds like you have it down to
pretty reproducible in your environment.  Some suggestions:
1) determine if the "slow" query execution is using the same query plan
    as the "fast" one.   Maybe you have already done this as part of your
    previous problem.

[bb] With the query that I was testing, I now believe it is because of the 
statement cache problem.  The query joins 7 tables, the plan that the optimizer 
finds is the best plan, but it does take a long time to compile.  So probably 
the optimizer is doing something like 7 combinations.   Through SquirrelSQL, 
when the statement is not in the cache, it takes about 12 seconds to compile 
and execute.  If I force the join order with "joinOrder=fixed", it takes about 
0.169 seconds to execute, so I am pretty sure the time is in the optimizer.

Once point in there when looking at the code is that the optimizers seems to be 
unbounded in time.  I see code checking for a maximum time but this is set to 
the maximum double size to start and only take effect if the optimizer 
determines the time to create the plan is going to be greater than just 
executing the query.   Maybe there should be a configurable parameter that 
specifies a maximum optimizer time so that it gives up say after 2 seconds and 
just executes what it has.  I need to look at this more.

2) does your app use blobs or clobs.  I do know that in some situations
    we need to keep locks on these when they are returned to users and those 
locks are cleared on gc if the application does not explicitly release them.  
your wait time of 20 seconds sounds a lot like the time for a lock timeout.  
does your application get lock timeouts and maybe automatically retry?  Maybe 
as a test you could set

[bb] No blobs or cblobs.  Locks can be held for a while and that is an artifact 
of the design which may have some problems.  The code is talking to a the 
database and to a physical network device over a communications link that is 
pretty oversubscribed.  The system keeps a "model" copy of what is configured 
in the device and so each interaction first updates the database (the model) 
and then the device.  If a failure occurs in communicating with the device or a 
failure occurs in actually updating the device configuration, the database 
transaction is rolled back.   So the general flow is:

1. query and update database
2. query and update device
3. commit or rollback changes to the database

So locks can be held for a period of time.  The protocol to talk to the device 
is SNMP with time outs of something like 5 seconds and 3 retries (plus the 
original) for a single SNMP message to the device and there may be hundreds 
that are transferred between the code and the device.   For the most part, 
optimistic locking (JPA) is done but I found a few places in the system where 
the database operations were flushed to Derby which can cause row and table 
locks to be acquired.

2) once you eliminate query plan then that kind of difference obviously
    means your thread is waiting on something.  Now you need to figure
    out what.  The full on answer might be to run a performance analyser
    on the system and compare when it runs fast and slow.   If you can
    really force the 20 second wait, maybe you can do the poor mans version of 
this and just cause the system to dump all the stacks of all the threads every 
second or so and see if you can catch the wait.  The trick here is to be able 
to identify the thread you are interested in, which is much harder in network 
server vs embedded mode.  What I would be inter
>
>
>
> Brett
>
>
>

[bb] Thanks for your thoughts Mike.  I think the statement cache is probably 
one of the biggest problems that I am seeing now that I understand it better.  
Also looking into the optimizer gives me a hint as to why some statements take 
a long time to optimize.   Unfortunately some of the queries are being done 
behind the scenes are difficult to change.   Also testing this beast (my 
application) is difficult just from a logistical problem of simulating 
(correctly) tens of thousands of network devices.  So in some of our testing, 
we  have not really hit these problems and they are being found in production.






Reply via email to