RE: How to present a heap dump to the list - Was: Had Derby 10.8.2.2 fail today and need some thoughts

Bergquist, Brett Thu, 01 Mar 2012 07:21:22 -0800

That is exactly what I just saw looking at the code.   Thanks for pointing this 
to me.


-----Original Message-----
From: Kristian Waagan [mailto:[email protected]] 
Sent: Thursday, March 01, 2012 10:12 AM
To: [email protected]
Subject: Re: How to present a heap dump to the list - Was: Had Derby 10.8.2.2 
fail today and need some thoughts

On 01.03.2012 15:55, Bergquist, Brett wrote:
> I think I can explain the 
> "org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask" instances 
> that are piling up in memory.

Haven't read what you say below, I really have to run, but see also
DERBY-4137 [1].
Maybe you're a good reason why we should implement the more complex solution :)


Cheers,
--
Kristian

[1] https://issues.apache.org/jira/browse/DERBY-4137

>
> I have derby.properties configured to timeout XA transactions at 15 minutes.  
>  The system is processing about 6M or so XA transactions a day or about 
> 70/second and 99.9% of these are going to work and not be canceled by the 
> Derby XA transaction timer.   So each of these invokes "cancel" when the XA 
> transaction is finalized or canceled or committed.  The 
> CancelXATransactionTask instance is marked as canceled when this done.
>
> The java.util.Timer does not remove these canceled tasks until their original 
> schedule time occurs however.  When that happens, the task is seen as 
> canceled and removed from the queue that is maintained by java.util.Timer.
>
> So from what I see, CancelXATransactionTask are being added to the 
> java.util.Timer queue at a rate of about 70/second and are being canceled but 
> each of these is sticking around in the queue for 15 minutes.  So these build 
> up in the queue for at least 15 minutes of time and then start being purged 
> out.  I guess this should reach a steady state after a while where adding to 
> the queue and the removing from the queue reaches an equilibrium.
>
> The java.util.Timer class does have "purge" method.  Since most XA 
> transactions are not going to timeout and the CancelXATransactionTask is 
> going to be canceled, I am wondering if it might not be wise to schedule a 
> call to "purge" periodically to remove these canceled timers from the Timer's 
> queue early?
>
>
>
> -----Original Message-----
> From: Bergquist, Brett [mailto:[email protected]]
> Sent: Thursday, March 01, 2012 9:01 AM
> To: [email protected]
> Subject: How to present a heap dump to the list - Was: Had Derby 
> 10.8.2.2 fail today and need some thoughts
>
> I have a couple of heap dumps that I can bring up in jvisualvm.    One is an 
> 8Gb dump of where Derby had an OOM when doing a large query and one is just a 
> snapshot of the heap of a running system that I see about 121K  instances of 
> "org.apache.derby.jdbc.XATransactionState$CancelXATransactionTask".   
> Jvisualvm is able to load these heap dumps but does not seem to have a good 
> way of outputting a report.
>
> I could do a screen capture and attach a JPEG, but I want to make sure this 
> is proper etiquette for the list before doing so.
>
> So any advice on this will be appreciated.
>
> Brett
>
> -----Original Message-----
> From: Bergquist, Brett [mailto:[email protected]]
> Sent: Tuesday, February 28, 2012 7:41 PM
> To: [email protected]
> Subject: RE: Had Derby 10.8.2.2 fail today and need some thoughts
>
> Hardware is Oracle M5000 running Solaris 10.   Database engine is controlled 
> through SMF of Solaris.   derby.properties is setup to always append to the 
> log, never truncate.  There definitely is not more than one process starting 
> the Network Server.
>
> Here is another tidbit.   Even though the log says the Derby engine was 
> shutting down, the process was still alive (prstat showed this) and the SMF 
> log also showed that the Network Server process never terminated.
>
>  From a previous capture of the "derby.log", it also appeared this happened 
> on the Feb 24'th as well, but seemed to recover and continue running.   
> Strange?
>
> On an separate thought in case it has to do with an out of memory condition, 
> I was using jvisualvm the other day and monitoring the heap/garbage 
> collection on a test system that tries to mimic this setup (Oracle M3000).   
> I did a heap dump an noticed many objects relating to XA transaction timeout 
> timers.   About 600K of these.   They seemed to be owned by 
> java.util.TimerTask.   I remember seeing some Derby issue updated yesterday 
> or the day before about java.util.TimerTask holding on to some things.   I 
> bring this up because recently we added the property to derby.properties to 
> cancel XA transactions if they take to long.   We probably do about 6M XA 
> transactions a day and I am wondering if maybe there is some sort of leak 
> here.   I will try my setup and look again tomorrow using jvisualvm.
>
> I appreciate your thoughts.
>
> Brett
> ________________________________________
> From: Mike Matrigali [[email protected]]
> Sent: Tuesday, February 28, 2012 6:40 PM
> To: [email protected]
> Subject: Re: Had Derby 10.8.2.2 fail today and need some thoughts
>
> Don't know by more info might help.  A catch 22 is that usually the top of 
> the derby.log usually has all the environment info that could be useful.
>
> Can you list OS, JVM.  I/O handling to the log is likely an OS thing.  What 
> properties do you have set, any special for error logging?
>
> I would likely look first for multiple starts and stops of derby and a 
> setting default for one of them to truncate the derby.log.
> Maybe leading to multiple processes trying to write to same derby.log.
>
> /mikem
>
> Bergquist, Brett wrote:
>> Our customer called and said the server was not working.   Before
>> restarting, I retrieved the derby.log and it seems strange because 
>> this was right at the top of the log:
>>
>>
>>
>> ----------------------------------------------------------------
>>
>> Tue Feb 28 15:33:58 EST 2012: Shutting down Derby engine
>>
>> ----------------------------------------------------------------
>>
>> Tue Feb 28 15:37:28 EST 2012 Thread[DRDAConnThread_245,5,main] (XID = 
>> 1482003981), (SESSIONID = 133257672), (DATABASE = csemdb), (DRDAID = 
>> ????????.??-4471791624540787385{335696}), Cleanup action starting
>>
>> Tue Feb 28 15:37:28 EST 2012 Thread[DRDAConnThread_245,5,main] (XID = 
>> 1482003981), (SESSIONID = 133257672), (DATABASE = csemdb), (DRDAID = 
>> ????????.??-4471791624540787385{335696}), Failed Statement is: null
>>
>> java.lang.NullPointerException
>>
>> Cleanup action completed
>>
>>
>>
>> The strange part is "Shutting down Derby engine" is the first thing in
>> the log.   The server was up and running since 2/16/2012 with no
>> problems with about 60M transactions processed, a backup performed each
>> night, etc.    And then this.
>>
>>
>>
>> So any thoughts on how the derby.log could be recreated and have this as
>> the first thing in the log?   I am thinking maybe an OutOfMemory
>> condition but all traces of whatever when wrong are gone, so I am 
>> trying to work backwards and see how the derby.log could be created 
>> with this as the first thing in the log.
>>
>>
>>
>> Thanks for any help!
>>
>>
>>
>> Brett
>>
>>
>>
>>
>>
>
>
>
>
>
>
>

RE: How to present a heap dump to the list - Was: Had Derby 10.8.2.2 fail today and need some thoughts

Reply via email to