Well constructed post. It gives me a lot to work with when thinking about your problem. I applaud you for that.
The interesting thing is that a restart of either the Tomcat servers or the ARServer fixes your issue. This throws an unknown variable into the mix (a monkey wrench, of sorts). If you are inclined to dig into the Tomcat performance, I suggest enabling JMX and hooking up a JMX monitoring tool to the Tomcat instances (I prefer jvisualvm, which is part of the jdk) to see what the server is busy doing and where the memory allocation is going. You might find that the CPU time required for garbage collection gets higher and higher. If this is the case, depending on the configuration of the Linux VMs (cpu count, memory size, etc.) that house Tomcat, you may be able to play with the GC settings to alleviate this. It may point you to some issue within the midtier that you can't address, but if you know where the problem is you can work with support. If using JMX, you will want to use a local jvisualvm so that you can get cpu profiling statistics. You might also look at how the midtier connects to the arserver. Do you use a proxy connection? Do you use a private queue? If you use a private queue, how many threads does it have and are those threads congested? Axton Grams On Thu, Mar 29, 2012 at 9:39 AM, L G Robinson <[email protected]> wrote: > ** > Hi Folks, > > I wrote to you a couple of months ago regarding a performance problem I am > experiencing with my 7.6.03 Remedy system. Specifically, the problem is > with my Mid-tier servers in that their performance degrades throughout the > day. I have taken to restarting the arserver processes via cron every > morning at 5:00 AM. This restores normal performance, but then the decline > begins again and the Mid-tier service will eventually become unusable if I > don't do the restart. > > The arserver is running on a Solaris 10 box with 16 Gb of real memory. It > is running: > > AR Server 7.6.03 Patch 002 201107191530 > > The Midtier environment is 7.6.04 running on a bank of four RH Linux VMs > running Tomcat behind a LVS (keepalived) load balancer. I have tried a > variety of JAVA and Tomcat versions with several VM memory configurations > and followed all of the BMC-published performance tuning recommendations. > > The application is a home-grown help desk application that we have been > running here at NC State for years. We do not run any BMC out-of-the-box > applications. In mid-December, we transitioned from ARS 5.1.2 to 7.6.03. We > did not upgrade the existing system, we build a new system from the ground > up, importing workflow and data. As we were not running Mid-tier prior to > deploying 7.6, we have no baseline to compare to. > > The problem: > > When the system is performing correctly, one is able to initiate a search > and click through the items returned in the browser with sub-second load > times as you move from record to record. As the day progresses, the same > search returns the same results but the time needed to move from record to > record increases. By 5 PM (12 hours after the restart) the sub-second > response has increased to approximately two seconds. If I test again at 11 > PM, the time has increased further. > > Note that the performance of the Windows User tool remains constant and > does not exhibit the performance degradation described for the Mid-tier. > > As mentioned above, the action that restores performance is to stop and > start the arserver processes using the arsystem script. I have a > cron-invoked script that stops the system, waits about 45 seconds for > things to settle down, and then starts it up again. The hardware is not > rebooted... just restarting the arserver. > > The other action that restores performance is to restart the Tomcat server > process. > > Things we have tried so far: > > - Tuning the Mid-tier server and environment including increasing VM RAM, > adjusting heap sizes, GC methods and some Mid-tier tweaking including > pre-fetch. > > - Used alternative memory management on the Solaris server by adding the > following to the arsystem script: LD_PRELOAD=libumem.so; export > LD_PRELOAD. I have not yet tried to use the libumem tools to see if there > really is a memory leak. > > - Patched the arserver with 7.6.03 Patch 2 which was purported to have a > number of fixes related to memory leaks. > > Observations: > > - Since restarting the arserver seems to restore performance, I am > inclined to think that the problem lies with the arserver. My first thought > was a memory leak and I observed that the memory utilization of the > arserverd process does steadily increase throughout the day. I don't know > if this is normal or not. Note that this steady increase in memory > utilization persists, even after applying Patch 2. > > - I have used Misi's very fine RRR|Log tools to check on the thread > configuration. Based on RRR|Log, the thread configuration is quite adequate > for the load that is generated during normal use. > > - To eliminate the network as a factor, I started a Tomcat/Mid-tier > instance on the same Solaris box that is hosting the arserver processes. > Even though this Mid-tier server is not in the public pool and no one is > accessing it except for me, it also displays the same performance profile > as the public pool Tomcat servers. > > - Restarting the Mid-tier service (by restarting Tomcat) also seems to > restore performance. Obvioulsy, I am not doing this on the public pool but > I have done it on a test Tomcat servers. > > - I have done some detailed analysis of API and SQL logs, comparing the > times for a repeatable set of transactions in the morning and in the > evening. The evening logs were captured during a time of minimal use. I > observed the following: > > + The timings of the individual SQL commands were roughly the same > across both log samples. This was measured by subtracting the SQL call > beginning time from the SQL call ending time. > > + The timings of the individual API calls were also roughly the same > across both log samples. In the cases where there was a significant > difference, the afternoon times exceeded the morning times. This was > measured by subtracting the API call beginning time from the API call > ending time. > > + I observed the most significant difference in the timings between the > evening and the morning in the time interval BETWEEN the API calls. That > is, the time from the end of one API call until the beginning of the next > API call (off hours with virtually no other load). > > > I can't come up with any sort of a network issue that would manifest as a > slow performance degradation. Considering that the two actions that restore > performance both involve dropping some or all of the connections between > the Mid-tier server and the arserver, I am still inclined to think that the > problem is with the arserver. > > If you have actually read this far, I applaud you and appreciate your > interest. If you have any suggestions on how to proceed to identify and > resolve this issue, I would be most grateful. > > Thanks. > Larry > > Larry Robinson > Remedy Developer/Administrator > NC State University > > _attend WWRUG12 www.wwrug.com ARSlist: "Where the Answers Are"_ > _______________________________________________________________________________ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug12 www.wwrug12.com ARSList: "Where the Answers Are"

