RE: How to present a heap dump to the list - Was: Had Derby 10.8.2.2 fail today and need some thoughts

Bergquist, Brett Thu, 01 Mar 2012 09:53:17 -0800

Looking at a heap dump today, it does appear the instances stabilize so the 
growth is not unbounded but rather a function of the transaction rate as 
indicated.  That is good!


I do think that the proposed more complex solution would be better.  Looking at 
the patch attached to the issue, one might have a separate timer be available 
from the factory for XA transactions so as to not affect the generic statement 
timeout capability.  Then the more complex solution could do something like 
call the "purge" every X cancels or every N seconds for example.  Either/both 
could be properties that could be configured in derby.properties.

-----Original Message-----
From: Kristian Waagan [mailto:[email protected]] 
Sent: Thursday, March 01, 2012 10:12 AM
To: [email protected]
Subject: Re: How to present a heap dump to the list - Was: Had Derby 10.8.2.2 
fail today and need some thoughts

On 01.03.2012 15:55, Bergquist, Brett wrote:
> I think I can explain the 
> "org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask" instances 
> that are piling up in memory.

Haven't read what you say below, I really have to run, but see also
DERBY-4137 [1].
Maybe you're a good reason why we should implement the more complex solution :)


Cheers,
--
Kristian

[1] https://issues.apache.org/jira/browse/DERBY-4137

>
> I have derby.properties configured to timeout XA transactions at 15 minutes.  
>  The system is processing about 6M or so XA transactions a day or about 
> 70/second and 99.9% of these are going to work and not be canceled by the 
> Derby XA transaction timer.   So each of these invokes "cancel" when the XA 
> transaction is finalized or canceled or committed.  The 
> CancelXATransactionTask instance is marked as canceled when this done.
>
> The java.util.Timer does not remove these canceled tasks until their original 
> schedule time occurs however.  When that happens, the task is seen as 
> canceled and removed from the queue that is maintained by java.util.Timer.
>
> So from what I see, CancelXATransactionTask are being added to the 
> java.util.Timer queue at a rate of about 70/second and are being canceled but 
> each of these is sticking around in the queue for 15 minutes.  So these build 
> up in the queue for at least 15 minutes of time and then start being purged 
> out.  I guess this should reach a steady state after a while where adding to 
> the queue and the removing from the queue reaches an equilibrium.
>
> The java.util.Timer class does have "purge" method.  Since most XA 
> transactions are not going to timeout and the CancelXATransactionTask is 
> going to be canceled, I am wondering if it might not be wise to schedule a 
> call to "purge" periodically to remove these canceled timers from the Timer's 
> queue early?
>
>
>
> -----Original Message-----
> From: Bergquist, Brett [mailto:[email protected]]
> Sent: Thursday, March 01, 2012 9:01 AM
> To: [email protected]
> Subject: How to present a heap dump to the list - Was: Had Derby 
> 10.8.2.2 fail today and need some thoughts
>
> I have a couple of heap dumps that I can bring up in jvisualvm.    One is an 
> 8Gb dump of where Derby had an OOM when doing a large query and one is just a 
> snapshot of the heap of a running system that I see about 121K  instances of 
> "org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask".   
> Jvisualvm is able to load these heap dumps but does not seem to have a good 
> way of outputting a report.
>
> I could do a screen capture and attach a JPEG, but I want to make sure this 
> is proper etiquette for the list before doing so.
>
> So any advice on this will be appreciated.
>
> Brett
>
> -----Original Message-----
> From: Bergquist, Brett [mailto:[email protected]]
> Sent: Tuesday, February 28, 2012 7:41 PM
> To: [email protected]
> Subject: RE: Had Derby 10.8.2.2 fail today and need some thoughts
>
> Hardware is Oracle M5000 running Solaris 10.   Database engine is controlled 
> through SMF of Solaris.   derby.properties is setup to always append to the 
> log, never truncate.  There definitely is not more than one process starting 
> the Network Server.
>
> Here is another tidbit.   Even though the log says the Derby engine was 
> shutting down, the process was still alive (prstat showed this) and the SMF 
> log also showed that the Network Server process never terminated.
>
>  From a previous capture of the "derby.log", it also appeared this happened 
> on the Feb 24'th as well, but seemed to recover and continue running.   
> Strange?
>
> On an separate thought in case it has to do with an out of memory condition, 
> I was using jvisualvm the other day and monitoring the heap/garbage 
> collection on a test system that tries to mimic this setup (Oracle M3000).   
> I did a heap dump an noticed many objects relating to XA transaction timeout 
> timers.   About 600K of these.   They seemed to be owned by 
> java.util.TimerTask.   I remember seeing some Derby issue updated yesterday 
> or the day before about java.util.TimerTask holding on to some things.   I 
> bring this up because recently we added the property to derby.properties to 
> cancel XA transactions if they take to long.   We probably do about 6M XA 
> transactions a day and I am wondering if maybe there is some sort of leak 
> here.   I will try my setup and look again tomorrow using jvisualvm.
>
> I appreciate your thoughts.
>
> Brett
> ________________________________________
> From: Mike Matrigali [[email protected]]
> Sent: Tuesday, February 28, 2012 6:40 PM
> To: [email protected]
> Subject: Re: Had Derby 10.8.2.2 fail today and need some thoughts
>
> Don't know by more info might help.  A catch 22 is that usually the top of 
> the derby.log usually has all the environment info that could be useful.
>
> Can you list OS, JVM.  I/O handling to the log is likely an OS thing.  What 
> properties do you have set, any special for error logging?
>
> I would likely look first for multiple starts and stops of derby and a 
> setting default for one of them to truncate the derby.log.
> Maybe leading to multiple processes trying to write to same derby.log.
>
> /mikem
>
> Bergquist, Brett wrote:
>> Our customer called and said the server was not working.   Before
>> restarting, I retrieved the derby.log and it seems strange because 
>> this was right at the top of the log:
>>
>>
>>
>> ----------------------------------------------------------------
>>
>> Tue Feb 28 15:33:58 EST 2012: Shutting down Derby engine
>>
>> ----------------------------------------------------------------
>>
>> Tue Feb 28 15:37:28 EST 2012 Thread[DRDAConnThread_245,5,main] (XID = 
>> 1482003981), (SESSIONID = 133257672), (DATABASE = csemdb), (DRDAID = 
>> ????????.??-4471791624540787385{335696}), Cleanup action starting
>>
>> Tue Feb 28 15:37:28 EST 2012 Thread[DRDAConnThread_245,5,main] (XID = 
>> 1482003981), (SESSIONID = 133257672), (DATABASE = csemdb), (DRDAID = 
>> ????????.??-4471791624540787385{335696}), Failed Statement is: null
>>
>> java.lang.NullPointerException
>>
>> Cleanup action completed
>>
>>
>>
>> The strange part is "Shutting down Derby engine" is the first thing in
>> the log.   The server was up and running since 2/16/2012 with no
>> problems with about 60M transactions processed, a backup performed each
>> night, etc.    And then this.
>>
>>
>>
>> So any thoughts on how the derby.log could be recreated and have this as
>> the first thing in the log?   I am thinking maybe an OutOfMemory
>> condition but all traces of whatever when wrong are gone, so I am 
>> trying to work backwards and see how the derby.log could be created 
>> with this as the first thing in the log.
>>
>>
>>
>> Thanks for any help!
>>
>>
>>
>> Brett
>>
>>
>>
>>
>>
>
>
>
>
>
>
>

RE: How to present a heap dump to the list - Was: Had Derby 10.8.2.2 fail today and need some thoughts

Reply via email to