Since you are on Solaris you should be able to run with the logs on all the
time. Here is a script template you can use to save log files. Just figure
out how often you wish to save the files and add a call to it from cron.
#! /usr/bin/ksh
#
############################################################
# Name: save_logs.sh
# Description: Save the log file(s) via a script
############################################################
#
AR_LOG_DIR={full path where log files are stored}
AR_SAVE_DIR={full path where you want to save logs at}
#
cur=`date +%H%M`
cd ${AR_LOG_DIR}
#
[ -r arsql.log ] && cp arsql.log ${AR_SAVE_DIR}/arsql_${cur}.log ;
[ -r arfilter.log ] && cp arfilter.log ${AR_SAVE_DIR}/arfilter_${cur}.log ;
[ -r arapi.log ] && cp arapi.log ${AR_SAVE_DIR}/arapi_${cur}.log ;
#
cd ${AR_SAVE_DIR}
#
netstat -a > netstat_${cur}.log
#
echo "prstat" > ps_${cur}.log
prstat -n25,10 -a 1 1 >> ps_${cur}.log
echo " " >> ps_${cur}.log
echo "vmstat" >> ps_${cur}.log
vmstat 1 2 >> ps_${cur}.log
echo " " >> ps_${cur}.log
echo "ps" >> ps_${cur}.log
ps -ef >> ps_${cur}.log
#
rm -f *${cur}.log.gz > /dev/null
gzip *${cur}.log
#
exit 0
Fred
-----Original Message-----
From: Action Request System discussion list(ARSList)
[mailto:[email protected]] On Behalf Of ZHANG, ERIC L
Sent: Thursday, January 27, 2011 4:26 PM
To: [email protected]
Subject: Re: Strange ARS Timeout Problem
** **
Good idea. I just put a cron job on the ars server that runs traceroute
<db_server> every minute and appends the output to an output file. Waiting for
the next timeout.
-----Original Message-----
From: LJ LongWing [mailto:[email protected]]
Sent: Thursday, January 27, 2011 9:18 AM
Subject: Re: Strange ARS Timeout Problem
Ok..I just completely re-read the original post...all indications save one are
that during that 5 minute interval the application server lost connectivity
with the DB server. The only exception to that appears to be the escalation
thread which continued processing during that 5 minute window...so, what I
would do would be to setup a cron to run every 30 seconds or every minute,
something along those lines that issues a tracert between your remedy server
and your db server. My primary thought is that you are losing network
connectivity..even though the escalation server is still working.it's at least
something you can try and report back.
From: Action Request System discussion list(ARSList)
[mailto:[email protected]] On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 2011 7:19 PM
To: [email protected]
Subject: Re: Strange ARS Timeout Problem
**
Yes, I did initial log analysis. As I said in the original posting, there was
5-minutes gap in the api log, while no gap/waiting/error/long operation was
showing in the sql log and escalation log. All the sql queries were for user
AR_ESCALATOR in the sql log.
-----Original Message-----
From: Axton [mailto:[email protected]]
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem
** What do the logs say? I haven't seen that you've done analysis with the
logs. Is there a gap in time in the logs (indicating the server was not doing
anything)? Is there are gap in time in the logs (indicating a long operation
was running?
On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L <[email protected]> wrote:
**
We have sent BMC tech support all the logs including api, filter, sql,
escalation, thread, plug-in, arfork, even pstack output that were taken during
hanging, and so far they haven't been able to identify the cause of the problem.
-----Original Message-----
From: Axton [mailto:[email protected]]
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem
** Try to get the api, filter, and sql logs leading up to the point where it
started hanging. Those are your best indicator. Also check the arerror.log
for crashes.
There are things that can cause behavior like this that the logs will indicate.
For example, try creating a computed group during production operations, or
importing a deployable application.
On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L <[email protected]> wrote:
**
Hi Listers.
We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All users
are getting timeout (or hourglass) but no process is being restarted in
armonitor.log.
This is the message showing in arerror.log:
Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy
server -- retry the operation (server_name) ARERR - 93
Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider
using more specific search criteria to narrow the results, and retry the
operation (ARERR 94)
In the API log, it shows a 5-minute gap:
<API > <TID: 0000000004> <RPC ID: 0000000000> <Queue: Admin > <Client-RPC:
999999 > <USER: Remedy Application Service > /* Tue Jan 18
2011 12:06:16.2224 */-GLEWF OK
<API > <TID: 0000000004> <RPC ID: 0000000000> <Queue: Admin > <Client-RPC:
999999 > <USER: Remedy Application Service > /* Tue Jan 18
2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class
from Unidentified Client (protocol 12) at IP address
Our DBA was monitoring the database during the time and found few activities in
the database. The activities shown in SQL log during the timeout were all for
user AR_ESCALATOR, which means the escalation was still running during the
time. This can also be verified from the escalation log.
When this occurs, the CPU and RAM utilizations are dramatically dropping to the
lowest levels on both the ARS server and the database server. There was no
application change in the last couple of months. The problem started about two
weeks ago. It could occur 3 times a day and sometimes it works fine for days
without it occurring.
Our configuration/environment:
ARS: 7.1 patch 7
ITSM: 7.0.03 patch 9
SLM: 7.1 patch 2
SRM: 2.2 patch 4
Midtier: 7.6.03
ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) -
Dedicated to ARServer, ITSM, SLM, and SRM.
Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by
customers to submit service request.
Database: Oracle: 10gR2 (remote)
The following are threads settings in ar.conf:
Private-RPC-Socket: 390601 2 6
Private-RPC-Socket: 390603 2 2
Private-RPC-Socket: 390620 16 24 (FAST)
Private-RPC-Socket: 390626 8 16
Private-RPC-Socket: 390627 2 12
Private-RPC-Socket: 390635 24 30 (LIST)
Private-RPC-Socket: 390680 24 24
Private-RPC-Socket: 390693 2 4
Private-RPC-Socket: 390698 2 4
We have about 300 concurrent Remedy users during the peak hours. ARServer is
running as non-root process. The number of open file descriptors for arserverd
(~700) was well below the ulimit 3072. The FAST and LIST threads never reached
the maximums.
I have an open ticket with BMC Support but thought I might get a solution
quicker from the Arslist here.
Thanks,
Eric
_______________________________________________________________________________
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: "Where the Answers Are"